Search code examples
pythondataframematplotliblabelannotate

How to annotate the data points with the index or column value in the dataframe


I draw a normal distribution plot based on the dataframe, then I want to annotate some data points in the plot as the index or column value of that data points. For example, here we have a dataframe:

df = pd.DataFrame({'col1': ['A', 'B', 'C'], 'col2': [3, 7, 9]})

Then I draw a normal distribution plot using the values in 'col2'.

Now, I want to annotate or label some data points with the values in 'col1'. For example, I want to show the 'col1' value (or text) which is 'B' on the data point '7' in the normal distribution plot.


Solution

  • Use Matplotlib's built-in annotate feature:

    ####################### Just creating a mock dataframe #############################
    import random                                                                     ##
    import string                                                                     ##
    import pandas as pd                                                               ##
    import matplotlib.pyplot as plt                                                   ##
    plt.rcParams["figure.figsize"] = (20,10)                                          ##
                                                                                      ##
    col1 = [random.choice(string.ascii_uppercase) for _ in range(500)]                ##
    col2 = [random.choice(range(0,25)) for _ in range(500)]                           ##
    df = pd.DataFrame({'col1': col1, 'col2': col2})                                   ## 
    ####################################################################################
    
    
    string = 'A'                                           # The string you want to find
    index = df[df.col1.eq(string)].index                   # The index(es) at which that string is in col1
    # index = df[df.col1.str.contains(string)].index       # If you are looking for a word or phrase in a string try this
    y, x, _ = plt.hist(df.col2)                            # Plot the histogram and grab the x and y values 
    plt.ylim(0, y.max()+10)                                # Set the ylim to the max y value plus some number
    for pos in index:                                      # Annotate what you want (we'll just do the "string" value) at the...
        plt.annotate(string, (df['col2'][pos],y.max()+5), fontsize = 20)          #... corresponding value in col2 at that index
    plt.show()
    

    enter image description here