I have some experimental data that is often flawed with artifacts exemplified with something like this:
I need a quick way to manually select these random spikes and remove them from datasets.
I figured that any plotting library with a focus on interactive plots should have an easy way to do this but so far I keep struggling with finding a simple way to do what I want.
I'm a Matplotlib/Seaborn guy and this calls for interactive solution. I briefly checked Plotly, Bokeh and Altair and decided to go with the first one. My first attempt looks like this:
import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interactive, HBox, VBox, Button
url='https://drive.google.com/file/d/1hCX8Bn_y30aXVN_TyHTTx015u44pO9yB/view?usp=sharing'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
df = pd.read_csv(url, index_col=0)
f = go.FigureWidget()
for col in df.columns[-1:]:
f.add_scatter(x = df.index, y=df[col], mode='markers+lines',
selected_marker=dict(size=5, color='red'),
marker=dict(size=1, color='lightgrey', line=dict(width=1, color='lightgrey')))
t = go.FigureWidget([go.Table(
header=dict(values=['selector range'],
fill = dict(color='#C2D4FF'),
align = ['left'] * 5),
cells=dict(values=['None selected' for col in ['ID']],
fill = dict(color='#F5F8FF'),
align = ['left'] * 5)
)])
def selection_fn(trace,points,selector):
t.data[0].cells.values = [selector.xrange]
def update_axes(dataset):
scatter = f.data[0]
scatter.x = df.index
scatter.y = df[dataset]
f.data[0].on_selection(selection_fn)
axis_dropdowns = interactive(update_axes, dataset = df.columns)
button1 = Button(description="Remove points")
button2 = Button(description="Reset")
button3 = Button(description="Fit data")
VBox((HBox((axis_dropdowns.children)), HBox((button1, button2, button3)), f,t))
Which gives:
So I managed to get Selector Box x coordinates (and temporarily print them inside the table widget). But what I couldn't figure out is how to easily bind a function to button1 that would take as an argument Box Selector coordinates and remove selected points from a dataframe and replot the data. So something like this:
def on_button_click_remove(scatter.selector.xrange):
mask = (df.index >= scatter.selector.xrange[0]) & (df.index <= scatter.selector.xrange[1])
clean_df = df.drop(df.index[mask])
scatter(data = clean_df...) #update scatter plot
button1 = Button(description="Remove points", on_click = on_button_click_remove)
I checked https://plotly.com/python/custom-buttons/ but I am still not sure how to use it for my purpose.
I suggest to use Holoviews and Panel.
They are high level visualization tools that facilitate the creation and control of low level bokeh
, matplotlib
or plotly
figures.
Here are an example:
import panel as pn
import holoviews as hv
import pandas as pd
from bokeh.models import ColumnDataSource
# This example use bokeh as backend.
# You can try plotly or matplotlib with minor modification on the codes below.
# For example you can use on_selection callback from Plotly
# https://plotly.com/python/v3/selection-events/
hv.extension('bokeh')
display( pn.extension( ) ) # activate panel
df=pd.read_csv('spiked_data.csv',index_col=0).reset_index()
pt = hv.Points(
data=df, kdims=['index', 'A' ]
).options( marker='x', size=2,
tools=['hover', 'box_select', 'lasso_select', 'reset'],
height=250, width=600
)
fig = hv.render(pt)
source = fig.select({'type':ColumnDataSource})
bt = pn.widgets.Button(name='remove selected')
def rm_sel(evt):
i = df.iloc[source.selected.indices].index # get index to delete
df.drop(i, inplace=True, errors='ignore') # modify dataframe
source.data = df # update data source
source.selected.indices=[] # clear selection
pn.io.push_notebook(app) # update figure
bt.on_click(rm_sel)
app=pn.Column(fig,'Click to delete the selected points', bt)
display(app)
A related example can be found in this SO answer