Search code examples

How to add entire dataframe row as scatter plot annotation

I'm plotting two columns of a Pandas DataFrame on a scatterplot and I want each point to show all the row values of the DataFrame. I've looked at this post, and tried to do something similar with mplcursors:

import pandas as pd
from datetime import date, datetime, time, timedelta
import numpy as np
import matplotlib.pyplot as plt
from mplcursors import cursor

df = pd.DataFrame()
df['datetime'] = pd.date_range(start='2016-01-01', end='2016-01-14', freq='30T')
#df = df.set_index('datetime')
df['x1'] = np.random.randint(-30, 30, size=len(df))
df['x2'] = np.random.randint(-30, 20, size=len(df))
df['x3'] = np.random.randint(-20, 30, size=len(df))
df['y1'] = np.random.randint(-100, 100, size=len(df))
df['y2'] = np.random.randint(-300, 200, size=len(df))
df['y3'] = np.random.randint(-200, 300, size=len(df))

def conditions(s):
    if (s['y1'] > 20) or (s['y3'] < 0):
        return 'group1'
    elif (s['x3'] < 20):
        return 'group2'
    elif (s['x2'] == 0):
        return 'group3'
        return 'group4'

df['category'] = df.apply(conditions, axis=1)

fig = plt.figure(figsize=(12,4))

ax1 = plt.subplot(121)
ax1.scatter(df.x1, df.y1, label='test1')
ax1.scatter(df.x2, df.y2, label='test2')
cr1 = cursor(ax1,hover=True)
#ax1.annotation_names = df.columns.tolist()
cr1.connect("add", lambda x: x.annotation.set_text(df.columns.tolist()[]))

ax2 = plt.subplot(122)
ax2.scatter(df.x1, df.y1, label='test1')
ax2.scatter(df.x3, df.y3, label='test3')
cr2 = cursor(ax2,hover=True)
#ax2.annotation_names = df.columns.tolist()
cr2.connect("add", lambda x: x.annotation.set_text(df.columns.tolist()[]))

# save figure
import pickle
pickle.dump(fig, open('FigureObject.fig.pickle', 'wb'))

When I hover over a point, I want to see a label containing (for example):

datetime = 2016-01-01 00:00:00 
x1 = 1 
x2 = -4 
x3 = 22 
y1 = -42 
y2 = -219 
y3 = -158    
category = group1

but I get this type of error:

cr2.connect("add", lambda x: x.annotation.set_text(df.columns.tolist()[]))
IndexError: list index out of range

How do I fix it?


    • The IndexError occurs because of df.columns.tolist()[]
      • df.columns.tolist() is a list of 7 columns, which is then indexed by [].
    • df.iloc[, :].to_dict() will get the desired row data for the point as a dict
      • A list comprehension creates a list of strings for each key value pair
      • '\n'.join(...) creates a string with each column separated by a \n
    • In mplcursors v0.5.1, is deprecated, use Selection.index instead.
      • df.iloc[x.index, :] instead of df.iloc[, :]
    cr1.connect("add", lambda x: x.annotation.set_text('\n'.join([f'{k}: {v}' for k, v in df.iloc[x.index, :].to_dict().items()])))

    enter image description here

    • Alternatively, use .to_string()
    cr1.connect("add", lambda x: x.annotation.set_text(df.iloc[x.index, :].to_string()))

    enter image description here