Search code examples
plotlyplotly-express

Overlaying two histograms with plotly express


I'd like to overlay two histograms which I currently display only one next to the other using the following simplistic code. The two dataframes are not the same length, but it still makes sense to overlay their histogram values.

import plotly.express as px

fig1 = px.histogram(test_lengths, x='len', histnorm='probability', nbins=10)
fig2 = px.histogram(train_lengths, x='len', histnorm='probability', nbins=10)
fig1.show()
fig2.show()

with pure plotly, this is the way, copied from the documentation:

import plotly.graph_objects as go

import numpy as np

x0 = np.random.randn(500)
# Add 1 to shift the mean of the Gaussian distribution
x1 = np.random.randn(500) + 1

fig = go.Figure()
fig.add_trace(go.Histogram(x=x0))
fig.add_trace(go.Histogram(x=x1))

# Overlay both histograms
fig.update_layout(barmode='overlay')
# Reduce opacity to see both histograms
fig.update_traces(opacity=0.75)
fig.show()

I just wonder if there's any particularly idiomatic way with plotly express. Hopefully this also works to exeplify the completeness and different levels of abstraction between plotly and plotly express.


Solution

  • The trick is to make a single Plotly Express figure by combining the data into a tidy dataframe, rather than to make two figures and try to combine them (which is currently impossible):

    import numpy as np
    import pandas as pd
    import plotly.express as px
    
    x0 = np.random.randn(250)
    # Add 1 to shift the mean of the Gaussian distribution
    x1 = np.random.randn(500) + 1
    
    df = pd.DataFrame(dict(
        series=np.concatenate((["a"] * len(x0), ["b"] * len(x1))), 
        data=np.concatenate((x0, x1))
    ))
    
    px.histogram(df, x="data", color="series", barmode="overlay")
    
    

    Yields:

    enter image description here