Search code examples
pythonpandasscatter-plotmulti-index

Annotate scatter plot with multiindex


I have constructed a scatter plot using data from a DataFrame with a multiindex. The indexes are country and year

fig,ax=plt.subplots(1,1)
rel_pib=welfare["rel_pib_pc"].loc[:,1960:2010].groupby("country").mean()
rel_lambda=welfare["Lambda"].loc[:,1960:2010].groupby("country").mean()
ax.scatter(rel_pib,rel_lambda)
ax.set_ylim(0,2)
ax.set_ylabel('Bienestar(Lambda)')
ax.set_xlabel('PIBPc')
ax.plot([0,1],'red', linewidth=1)

I would like to annotate each point with the country name (and if possible, the Lambda value). I have the following code

for i, txt in enumerate(welfare.index):
    plt.annotate(txt, (welfare["rel_pib_pc"].loc[:,1960:2010].groupby("country").mean()[i], welfare["Lambda"].loc[:,1960:2010].groupby("country").mean()[i]))

I am not sure how to indicate that i want the country names since all the lambda and pib_pc values for a given country are given as a single value, since I´m using the .mean() function.

I have tried using .xs() but all the combinations I tried won´t work.


Solution

  • I used the following test data:

                   rel_pib_pc  Lambda
    country  year                    
    Country1 2007         260    1.12
             2008         265    1.13
             2009         268    1.10
    Country2 2007         230    1.05
             2008         235    1.07
             2009         236    1.04
    Country3 2007         200    1.02
             2008         203    1.07
             2009         208    1.05
    

    Then, to generate a scatter plot, I used the following code:

    fig, ax = plt.subplots(1, 1)
    ax.scatter(rel_pib,rel_lambda)
    ax.set_ylabel('Bienestar(Lambda)')
    ax.set_xlabel('PIBPc')
    ax.set_xlim(190,280)
    annot_dy = 0.005
    for i, txt in enumerate(rel_lambda.index):
        ax.annotate(txt, (rel_pib.loc[txt], rel_lambda.loc[txt] + annot_dy), ha='center')
    plt.show()
    

    and got the following result:

    enter image description here

    The trick to correctly generate annotations is:

    • Enumerate the index of one of already generated Series objects, so that txt contains the country name.
    • Take values from already generated Series objects (don't compute these values again).
    • Locate both coordinates by the current index value.
    • To put these annotations just above respective points, use:
      • ha (horizontal alignment) as 'center',
      • shift y coordinate a little up (if needed, experiment with othere values of annot_dy.

    I added also ax.set_xlim(190,280) in order to keep annotations within the picture rectangle. Maybe you will not need it.