Search code examples
pythonmatplotlibpicker

Picker Event to display legend labels in matplotlib Ver. 2


Following with my last post Here

I'm still trying to do the same: when click on a point, I want it to display the x, y, and the ID. This time, the data is from .loc function and it's broken.

Sometimes it will display false information such as the wrong ID and wrong xy coordinates with some messy extra information such as the index and the dtype:

id: 12    6
Name: ID, dtype: int32
x: 12    8.682
Name: x, dtype: float64 y: 12    85.375436
Name: y, dtype: float64

Sometimes when you click on other point(s), you get this message:

Traceback (most recent call last):
      File "C:\Users\Dingyi Duan\anaconda3\lib\site-packages\matplotlib\cbook\__init__.py", line 224, in process
        func(*args, **kwargs)
      File "C:\Users\Dingyi Duan\.spyder-py3\temp.py", line 25, in onpick
        print('id:', ID[ind])
      File "C:\Users\Dingyi Duan\anaconda3\lib\site-packages\pandas\core\series.py", line 877, in __getitem__
        return self._get_with(key)
      File "C:\Users\Dingyi Duan\anaconda3\lib\site-packages\pandas\core\series.py", line 912, in _get_with
        return self.loc[key]
      File "C:\Users\Dingyi Duan\anaconda3\lib\site-packages\pandas\core\indexing.py", line 895, in __getitem__
        return self._getitem_axis(maybe_callable, axis=axis)
      File "C:\Users\Dingyi Duan\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1113, in _getitem_axis
        return self._getitem_iterable(key, axis=axis)
      File "C:\Users\Dingyi Duan\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1053, in _getitem_iterable
        keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
      File "C:\Users\Dingyi Duan\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1266, in _get_listlike_indexer
        self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
      File "C:\Users\Dingyi Duan\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1308, in _validate_read_indexer
        raise KeyError(f"None of [{key}] are in the [{axis_name}]")
    KeyError: "None of [Int64Index([2], dtype='int64')] are in the [index]"    

Ideally, I want to write a full function for any given dataframe with a similar structure (x, y, ID, category) that generates a scatter plot with points colored by a third column(i.e. 'ID'), with the legend showing each color plotted; Then when you click on any point, it will simply give information of "ID: xxx, x: xxx, y: xxx" - basically a substitute for the Matlab plotting.

Please see the full code below:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random


A = np.random.uniform(0, 100, 50)
B = np.random.uniform(0, 100, 50)
C = np.random.randint(0,25,50)
D = [0]*25 + [1]*25
random.shuffle(D)


# x y data and legend labels
df = pd.DataFrame({"x": A, "y": B, "ID": C, "cat": D})


x = df['x'].loc[df['cat']==0]
y = df['y'].loc[df['cat']==0]
ID = df['ID'].loc[df['cat']==0]

# define the event
def onpick(event):
    ind = event.ind
    print('id:', ID[ind])
    print('x:', x[ind], 'y:', y[ind])

# create the plot
fig, ax = plt.subplots(figsize=(8, 6), dpi=100)
scatter = ax.scatter(x, y, c = ID, picker=True)

ax.set_xlabel('x')
ax.set_ylabel('y')
ax.legend(*scatter.legend_elements(num=list(np.unique(ID))),
          loc="center left", 
          title='ID', 
          bbox_to_anchor=(1, 0.5),
          ncol=2
         )    
ax.ticklabel_format(useOffset=False)
ax.tick_params(axis = 'x',labelrotation = 45)
plt.tight_layout()


# call the event
fig.canvas.mpl_connect('pick_event', onpick)
plt.show()

Solution

  • When you use the picker after using loc or any other type of dataframe manipulation that affects the index, you will have to first reset the new index with reset_index. This will correct any disconnect between the picker's index and the dataframe's index. Next, to remove that excessive information that is being displayed, put the column values (for x, y, and ID) into arrays by placing .values at the end of your defining code. This will clean them up to just be a number/value to be displayed by the picker event. I'll provide two ways of doing it:

    df = pd.DataFrame({"x": A, "y": B, "ID": C, "cat": D})
    x = df['x'].loc[df['cat']==0].reset_index(drop=True).values
    y = df['y'].loc[df['cat']==0].reset_index(drop=True).values
    ID = df['ID'].loc[df['cat']==0].reset_index(drop=True).values
    

    or

    df = pd.DataFrame({"x": A, "y": B, "ID": C, "cat": D})
    notCatDF = df[df["cat"].eq(0)].reset_index(drop=True)
    x = notCatDF['x'].values
    y = notCatDF['y'].values
    ID = notCatDF['ID'].values
    

    The call to reset_index will make another column called "index" that keeps the original indexes before resetting it, I like to remove that column with the code drop=True.

    Put it all together and you get the output you want (picking random point): enter image description here