Search code examples
pythonpandasmatplotlibseabornstem-plot

Lollipop plot for dataframe with two groups


I have the following dataframe:

                   Country variable      value
0                 Afghanistan     Area  38.232510
1                 Afghanistan     Yield  70.081666
2                   Argentina     Area  96.776730
3                   Argentina     Area  60.047651
4                   Argentina     Yield  66.811117
..                        ...      ...        ...
133  United States Of America    Yield  53.536069
134  United States Of America    Area   76.975885
135  United States Of America    Yield  19.987656
136                    Zambia    Yield  39.493612
137                    Zambia    Yield  35.384809

I want to use it to construct a lollipop graphic (e.g. https://python-graph-gallery.com/184-lollipop-plot-with-2-groups/). However, the example dataframe is different from mine in that it has two values for each group while I want to plot the minimum and maximum value for the two groups for each country, with the group being differentiated by hue. How can I do it by modifying the code from that example?


Solution

  • This should make a good starting point to refine from:

    df_agg = df.groupby(['Country', 'variable']).agg([min, max]).droplevel(level=0, axis=1).reset_index()
    
    colours = { 
        'Area' : { 'line' : 'pink', 'min' : 'crimson', 'max' : 'red' }, 
        'Yield' : { 'line' : 'skyblue', 'min' : 'navy', 'max' : 'blue' }, 
    }
    
    vars = df_agg['variable'].unique()
    
    for var in vars:
        df_plt = df_agg[df_agg['variable'] == var]
        
        my_range = list(df_plt.index)
        
        plt.hlines(y=my_range, xmin=df_plt['min'], xmax=df_plt['max'], color=colours[var]['line'], alpha=0.4)
        plt.scatter(df_plt['min'], my_range, color=colours[var]['min'], alpha=1, label=f'{var} min')
        plt.scatter(df_plt['max'], my_range, color=colours[var]['max'], alpha=1, label=f'{var} max')
    
    # Add legend, title and axis names
    plt.legend()
    plt.yticks(df_agg.index, df_agg['Country'])
    plt.title("Min and Max per Country", loc='left')
    plt.xlabel('Values')
    plt.ylabel('Country')
    
    # Show the graph
    plt.show()
    

    For the data in your question, this gives: enter image description here

    You can also "group" the country values by making the my_range value float around an integer value based on Country, then only putting y ticks at those integer values:

    df_agg = df.groupby(['Country', 'variable']).agg([min, max]).droplevel(level=0, axis=1).reset_index()
    
    colours = { 
        'Area' : { 'line' : 'pink', 'min' : 'crimson', 'max' : 'red' }, 
        'Yield' : { 'line' : 'skyblue', 'min' : 'navy', 'max' : 'blue' }, 
    }
    
    countries = list(df_agg['Country'].unique())
    
    vars = df_agg['variable'].unique()
    
    # figure out y positions for each lollipop
    # make them go from y-0.2 to y+0.2
    plot_y = { var : pt for var, pt in zip(vars, np.linspace(-0.2, 0.2, num=len(vars))) }
        
    for var in vars:
        df_plt = df_agg[df_agg['variable'] == var]
        
        my_range = list(df_plt['Country'].apply(countries.index) + plot_y[var])
        
        plt.hlines(y=my_range, xmin=df_plt['min'], xmax=df_plt['max'], color=colours[var]['line'], alpha=0.4)
        plt.scatter(df_plt['min'], my_range, color=colours[var]['min'], alpha=1, label=f'{var} min')
        plt.scatter(df_plt['max'], my_range, color=colours[var]['max'], alpha=1, label=f'{var} max')
    
    # Add legend, title and axis names
    plt.legend()
    plt.yticks(range(len(countries)), countries)
    plt.title("Min and Max per Country", loc='left')
    plt.xlabel('Values')
    plt.ylabel('Country')
    
    # Show the graph
    plt.show()
    

    enter image description here