Search code examples
pythonplotlyipywidgets

How to remove points from a dataframe based on a selected area on a plot


I have some experimental data that is often flawed with artifacts exemplified with something like this: enter image description here

I need a quick way to manually select these random spikes and remove them from datasets.

I figured that any plotting library with a focus on interactive plots should have an easy way to do this but so far I keep struggling with finding a simple way to do what I want.

I'm a Matplotlib/Seaborn guy and this calls for interactive solution. I briefly checked Plotly, Bokeh and Altair and decided to go with the first one. My first attempt looks like this:


import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interactive, HBox, VBox, Button

url='https://drive.google.com/file/d/1hCX8Bn_y30aXVN_TyHTTx015u44pO9yB/view?usp=sharing'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]

df = pd.read_csv(url, index_col=0)

f = go.FigureWidget()
for col in df.columns[-1:]: 
    f.add_scatter(x = df.index, y=df[col], mode='markers+lines', 
                  selected_marker=dict(size=5, color='red'),
                  marker=dict(size=1, color='lightgrey', line=dict(width=1, color='lightgrey')))
    
t = go.FigureWidget([go.Table(
    header=dict(values=['selector range'],
                fill = dict(color='#C2D4FF'),
                align = ['left'] * 5),
    cells=dict(values=['None selected' for col in ['ID']],
               fill = dict(color='#F5F8FF'),
               align = ['left'] * 5)
                )])
    
def selection_fn(trace,points,selector):
    t.data[0].cells.values = [selector.xrange]


def update_axes(dataset):
    scatter = f.data[0]
    scatter.x = df.index
    scatter.y = df[dataset]
    
f.data[0].on_selection(selection_fn)


axis_dropdowns = interactive(update_axes, dataset = df.columns)

button1 = Button(description="Remove points")
button2 = Button(description="Reset")
button3 = Button(description="Fit data")


VBox((HBox((axis_dropdowns.children)), HBox((button1, button2, button3)), f,t))

Which gives:

enter image description here

So I managed to get Selector Box x coordinates (and temporarily print them inside the table widget). But what I couldn't figure out is how to easily bind a function to button1 that would take as an argument Box Selector coordinates and remove selected points from a dataframe and replot the data. So something like this:

def on_button_click_remove(scatter.selector.xrange):
    mask = (df.index >= scatter.selector.xrange[0]) & (df.index <= scatter.selector.xrange[1]) 
    clean_df = df.drop(df.index[mask])
    scatter(data = clean_df...) #update scatter plot

button1 = Button(description="Remove points", on_click = on_button_click_remove)

I checked https://plotly.com/python/custom-buttons/ but I am still not sure how to use it for my purpose.


Solution

  • I suggest to use Holoviews and Panel. They are high level visualization tools that facilitate the creation and control of low level bokeh, matplotlib or plotly figures.

    Here are an example:

    import panel as pn
    import holoviews as hv 
    import pandas as pd
    from bokeh.models import ColumnDataSource
    
    # This example use bokeh as backend. 
    # You can try plotly or matplotlib with minor modification on the codes below.
    # For example you can use on_selection callback from Plotly
    #    https://plotly.com/python/v3/selection-events/
    hv.extension('bokeh') 
    display( pn.extension( ) ) # activate panel
    
    df=pd.read_csv('spiked_data.csv',index_col=0).reset_index()
    
    pt = hv.Points( 
           data=df, kdims=['index', 'A' ] 
         ).options( marker='x', size=2, 
           tools=['hover', 'box_select', 'lasso_select', 'reset'], 
           height=250, width=600 
         )
    
    fig    = hv.render(pt)
    source = fig.select({'type':ColumnDataSource})
    
    bt = pn.widgets.Button(name='remove selected')
    def rm_sel(evt):
        i =  df.iloc[source.selected.indices].index  # get index to delete
        df.drop(i, inplace=True, errors='ignore')    # modify dataframe
        source.data = df           # update data source
        source.selected.indices=[] # clear selection
        pn.io.push_notebook(app)   # update figure
    bt.on_click(rm_sel)
    
    app=pn.Column(fig,'Click to delete the selected points', bt)
    display(app)
    

    GIF of screenshot

    A related example can be found in this SO answer