I have multiple clusters and each datapoint in the cluster has a special group. I am trying to highlight selected data points (with yellow color) in the plotly scatter plot based on selected value (group) from a dropdown list.
Here is a code to generate sample data:
import pandas as pd
import numpy as np
def generate_random_cluster(name, size, loc_x, loc_y, groups=['A','B','C'], p=None):
return pd.DataFrame({
'name': name,
'x': np.random.normal(loc=loc_x, size=size),
'y': np.random.normal(loc=loc_y, size=size),
'group': np.random.choice(['A','B','C'], size=size, p=p)
})
groups = ['A','B','C']
cluster_1 = generate_random_cluster(name='cluster_1', size=15, loc_x=3, loc_y=2, groups=groups, p=[0.7, 0.2, 0.1])
cluster_2 = generate_random_cluster(name='cluster_2', size=35, loc_x=9, loc_y=5, groups=groups, p=[0.2, 0.7, 0.1])
cluster_3 = generate_random_cluster(name='cluster_3', size=20, loc_x=6, loc_y=8, groups=groups, p=[0.1, 0.2, 0.7])
data = pd.concat([cluster_1, cluster_2, cluster_3]).reset_index(drop=True)
data.head()
Which returns dataframe like this:
name | x | y | group |
---|---|---|---|
cluster_1 | 3.198048 | 0.385736 | B |
cluster_1 | 1.784080 | 2.608631 | A |
cluster_1 | 4.160103 | 2.119545 | A |
cluster_1 | 2.522486 | 1.994962 | B |
cluster_1 | 4.073054 | 1.204167 | A |
I am quite new to plotly, but based from documentation I thought I just need to use update_layout
method like this:
import plotly.graph_objects as go
cluster_colors = {'cluster_1': 'green', 'cluster_2': 'red', 'cluster_3': 'blue'}
layout = go.Layout(
xaxis = go.layout.XAxis(
showticklabels=False),
yaxis = go.layout.YAxis(
showticklabels=False
)
)
fig = go.Figure(layout=layout)
for cluster_ix, (cluster, df) in enumerate(data.groupby('name')):
customdata = df['group']
fig.add_scatter(
x=df['x'],
y=df['y'],
name=cluster,
mode='markers',
customdata=customdata,
hovertemplate="<br>".join([
"X: %{x}",
"Y: %{y}",
"Group: %{customdata}"
]),
marker_color=[cluster_colors[cluster] for _ in range(len(df))],
)
def highlight_group(group):
result = []
for tracer_ix, tracer in enumerate(fig["data"]):
colors = ["yellow" if datapoint_group == group else cluster_colors[fig["data"][tracer_ix]["name"]] for datapoint_group in fig["data"][tracer_ix]["customdata"]]
result.append(colors)
return result
fig.update_layout(
updatemenus=[
{
"buttons": [
{
"label": group,
"method": "update",
"args": [
{"marker": {"color": highlight_group(group)}}
],
}
for group in groups
]
}
],
margin={"l": 0, "r": 0, "t": 25, "b": 0},
height=700
)
fig.show()
This generates plot like this:
But when I change the value from the dropdown list, every marker turns black:
How to correctly highlight selected markers?
Based on @jmmease's answer here in the plotly forums, I believe you can restructure the markers dictionary:
fig.update_layout(
updatemenus=[
{
"buttons": [
{
"label": group,
"method": "update",
"args": [
{"marker.color": highlight_group(group)}
],
}
for group in groups
]
}
],
margin={"l": 0, "r": 0, "t": 25, "b": 0},
height=700
)
Here is the result:
This accomplishes what you asked in your original question, but from a design perspective, you might want to add another dropdown option so that you can select no groups – otherwise, once you select a group, you cannot return the figure to its original state.
Since your code is pretty robust, you can iterate through groups+["None"]
to create the buttons instead of groups (so that I don't modify groups
), you will have another dropdown option with the label None
:
fig.update_layout(
updatemenus=[
{
"buttons": [
{
"label": group,
"method": "update",
"args": [
{"marker.color": highlight_group(group)}
],
}
for group in groups+["None"]
]
}
],
margin={"l": 0, "r": 0, "t": 25, "b": 0},
height=700
)
Then the result looks like this:
This next part is beyond the scope of your original question, but there may be some potential confusion in the legend because when you create the figure, the name of each cluster (and therefore the marker color as indicated in the legend) is linked to the cluster instead of the marker color – this means that when you select a certain group to color "yellow", you'll have a cluster group where some markers are colored yellow, and other markers have their original color, and I believe plotly will have to choose a color arbitrarily for the legend – probably the color of the first marker within a group.
For example, once we select Group B
from the dropdown, cluster 3
is mostly blue markers as you defined when creating the figure, but there is also a mixture of yellow markers from Group B, and this causes the legend entry to be colored yellow. The same issue exists for cluster 2
which is mostly red markers but contains some Group B yellow markers. If I think of a solution, I'll update my answer.