python plotly data-visualization plotly-python

Plotly box plot with multiple categories

Consider the following toy data:

import pandas as pd
import numpy as np
from plotly import graph_objects as go
from plotly.subplots import make_subplots

np.random.seed(42)

df = pd.DataFrame(
    {
        "val1": np.random.normal(0, 1, size=100),
        "val2": np.random.normal(5, 2, size=100),
        "cat": np.random.choice(["a", "b"], size=100),
    }
)

which yields (top 5 rows):

	val1	val2	cat
0	0.496714	2.16926	b
1	-0.138264	4.15871	b
2	0.647689	4.31457	a
3	1.52303	3.39545	b
4	-0.234153	4.67743	a

My objective is to get two box plots each containing two boxes (one per category).

Following code:

fig = make_subplots(rows=2, cols=1, subplot_titles=["Value 1 dist", "Value 2 dist"])

fill_colors = {"a": "rgba(150, 25, 40, 0.5)", "b": "rgba(25, 150, 40, 0.5)"}

for i, val in enumerate(["val1", "val2"]):
    for c in df["cat"].unique():
        dff = df[df["cat"] == c]
        fig.add_trace(
            go.Box(
                y=dff[val],
                x=dff["cat"],
                boxmean="sd",
                name=c,
                showlegend=True if val=="val1" else False,
                fillcolor=fill_colors[c],
                line={"color": fill_colors[c]},
            ),
            row=i + 1,
            col=1,
        )

Brings me very close:

Here are the things I would like to adjust:

How do I get, programmatically, the first 2 (or n) colors used in the default cycle of Plotly? So the result is compatible with other plots. Note that I hardcoded the colors...
The legend on the left; is there a more programmatic way to have only single legend? Note that I used showlegend=True if val=="val1" else False.
Bonus: How can I control the order of the boxes (i.e. which category comes first?)

I posted in the past two related questions (here and here) but the answers there didn't help me tune me plot as I want.

Solution

Please refer to the official reference for how to get the color names for a standard color set. You can get the color names in a list.
As for controlling duplicate legends, I personally don't have a problem with your method as I use it and it is a common approach, but if I were to handle it programmatically, I would use set() to make it unique by adding the duplicate legend names. I learned this Tips from this answer.
The third is to order by category, you can specify ascending or descending order by category.

This is a response from someone who did not get the expected answer. What was unsatisfactory about my previous answers? I will respond whenever possible.

import pandas as pd
import numpy as np
from plotly import graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# https://plotly.com/python/discrete-color/#color-sequences-in-plotly-express
plotly_default = px.colors.qualitative.Plotly
print(plotly_default)

fig = make_subplots(rows=2, cols=1, subplot_titles=["Value 1 dist", "Value 2 dist"])

fill_colors = {"a": plotly_default[0], "b": plotly_default[1]}

for i, val in enumerate(["val1", "val2"]):
    for c in df["cat"].unique():
        dff = df[df["cat"] == c]
        fig.add_trace(
            go.Box(
                y=dff[val],
                x=dff["cat"],
                boxmean="sd",
                name=c,
                showlegend=True, # if val=="val1" else False,
                fillcolor=fill_colors[c],
                line={"color": fill_colors[c]},
                opacity=0.5
            ),
            row=i + 1,
            col=1,
        )
names = set()
fig.for_each_trace(
    lambda trace:
        trace.update(showlegend=False)
        if (trace.name in names) else names.add(trace.name))

fig.update_xaxes(categoryorder='category ascending')
fig.update_layout(legend=dict(traceorder='reversed'))
fig.show()