I draw a normal distribution plot based on the dataframe, then I want to annotate some data points in the plot as the index or column value of that data points. For example, here we have a dataframe:
df = pd.DataFrame({'col1': ['A', 'B', 'C'], 'col2': [3, 7, 9]})
Then I draw a normal distribution plot using the values in 'col2'.
Now, I want to annotate or label some data points with the values in 'col1'. For example, I want to show the 'col1' value (or text) which is 'B' on the data point '7' in the normal distribution plot.
Use Matplotlib's built-in annotate feature:
####################### Just creating a mock dataframe #############################
import random ##
import string ##
import pandas as pd ##
import matplotlib.pyplot as plt ##
plt.rcParams["figure.figsize"] = (20,10) ##
##
col1 = [random.choice(string.ascii_uppercase) for _ in range(500)] ##
col2 = [random.choice(range(0,25)) for _ in range(500)] ##
df = pd.DataFrame({'col1': col1, 'col2': col2}) ##
####################################################################################
string = 'A' # The string you want to find
index = df[df.col1.eq(string)].index # The index(es) at which that string is in col1
# index = df[df.col1.str.contains(string)].index # If you are looking for a word or phrase in a string try this
y, x, _ = plt.hist(df.col2) # Plot the histogram and grab the x and y values
plt.ylim(0, y.max()+10) # Set the ylim to the max y value plus some number
for pos in index: # Annotate what you want (we'll just do the "string" value) at the...
plt.annotate(string, (df['col2'][pos],y.max()+5), fontsize = 20) #... corresponding value in col2 at that index
plt.show()