Search code examples
pythonrplotly

Is there a way to calculate optimal sizeref value for plotly Scatter3d


I am using the fantastic plotly library to draw 3D scatter diagrams and am trying to determine how to calculate the size of the bubbles.

Note that the data is not that important (and it would be hard to show here) other than the size of the bubbles should scale with the value of the data in the "size" attribute. Unfortunately, the values of this data varies from time-to-time and hence setting a fixed "size" value is not practical. plotly provides the "sizeref" attribute (see code below) which scales the size of the bubble. I have found a formula (on plotly site) that works for 2D but does not seem to apply to 3D charts.

My question is this: is there a convenient formula to calculate the value of sizeref? I am thinking that the sizeref formula would be dependent upon the max/min of the data (ie. data for the "size" attribute) and the layout size (800 height and 800 width as per code below). I have tried a number my own formulas but none work well.

Any ideas would be appreciated (Note: I am using Python but I suspect the solution would as applicable to plotly code in R).

import plotly
import plotly.graph_objs as go

#
# The dataframe, df, is calculated elsewhere
#

x = list(df["comp-0"])
y = list(df["comp-1"])
z = list(df["comp-2"])

text = list(df["label"])
color = list(df["cluster"])
size = list(df["degree"])
sizeref = 50
sizemin = 1

trace1 = go.Scatter3d(
    x=x, y=y, z=z,
    text=text,
    mode="markers",
    marker=dict(
        sizemode="diameter",
        sizeref=sizeref,
        sizemin=sizemin,
        size=size,
        color=color,
        colorscale="Viridis",
        line=dict(color="rgb(150, 150, 150)")
    )
)

data = [trace1]
title = "Clusters"
layout = go.Layout(height=800, width=800, title=title)

fig = go.Figure(data=data, layout=layout)
plotly.offline.plot(fig)

Solution

  • The formula I used in Plotly Express is here: https://github.com/plotly/plotly.py/blob/8445f916fa84fe17cfc15e95354c0a870113ad8c/packages/python/plotly/plotly/express/_core.py#L1721

    sizeref = df["size_column"].max() / max_size ** 2
    

    Some notes:

    • This formula assumes sizemode is area and not diameter, which is the perceptually optimal thing to do given how humans perceive size. If you want to use diameter mode you can use sizeref = df["size_column"].max() / max_size
    • This formula doesn't take into account a "minimum" size because Plotly always considers the minimum size to be 0 when the data is 0. You can't map an arbitrary range to size. The sizemin argument is a "clipping" argument meaning that any marker whose size "would be" lower than sizemin is rendered at sizemin
    • The default value of max_size in Plotly Express is 20, and I've found that values between 15 and 60 can look good, depending on the data and the number of subplots etc.