I am using the fantastic plotly library to draw 3D scatter diagrams and am trying to determine how to calculate the size of the bubbles.
Note that the data is not that important (and it would be hard to show here) other than the size of the bubbles should scale with the value of the data in the "size" attribute. Unfortunately, the values of this data varies from time-to-time and hence setting a fixed "size" value is not practical. plotly provides the "sizeref" attribute (see code below) which scales the size of the bubble. I have found a formula (on plotly site) that works for 2D but does not seem to apply to 3D charts.
My question is this: is there a convenient formula to calculate the value of sizeref? I am thinking that the sizeref formula would be dependent upon the max/min of the data (ie. data for the "size" attribute) and the layout size (800 height and 800 width as per code below). I have tried a number my own formulas but none work well.
Any ideas would be appreciated (Note: I am using Python but I suspect the solution would as applicable to plotly code in R).
import plotly
import plotly.graph_objs as go
#
# The dataframe, df, is calculated elsewhere
#
x = list(df["comp-0"])
y = list(df["comp-1"])
z = list(df["comp-2"])
text = list(df["label"])
color = list(df["cluster"])
size = list(df["degree"])
sizeref = 50
sizemin = 1
trace1 = go.Scatter3d(
x=x, y=y, z=z,
text=text,
mode="markers",
marker=dict(
sizemode="diameter",
sizeref=sizeref,
sizemin=sizemin,
size=size,
color=color,
colorscale="Viridis",
line=dict(color="rgb(150, 150, 150)")
)
)
data = [trace1]
title = "Clusters"
layout = go.Layout(height=800, width=800, title=title)
fig = go.Figure(data=data, layout=layout)
plotly.offline.plot(fig)
The formula I used in Plotly Express is here: https://github.com/plotly/plotly.py/blob/8445f916fa84fe17cfc15e95354c0a870113ad8c/packages/python/plotly/plotly/express/_core.py#L1721
sizeref = df["size_column"].max() / max_size ** 2
Some notes:
sizemode
is area
and not diameter
, which is the perceptually optimal thing to do given how humans perceive size. If you want to use diameter
mode you can use sizeref = df["size_column"].max() / max_size
sizemin
argument is a "clipping" argument meaning that any marker whose size "would be" lower than sizemin
is rendered at sizemin
max_size
in Plotly Express is 20, and I've found that values between 15 and 60 can look good, depending on the data and the number of subplots etc.