I have two sets of data in separate lists. Each list element has a value from 0:100, and elements repeat.
For example:
first_data = [10,20,40,100,...,100,10,50]
second_data = [20,50,50,10,...,70,10,100]
I can plot one of these in a histogram using:
import plotly.graph_objects as go
.
.
.
fig = go.Figure()
fig.add_trace(go.Histogram(histfunc='count', x=first_data))
fig.show()
By setting histfunc
to 'count'
, my histogram consists of an x-axis from 0 to 100 and bars for the number of repeated elements in first_data
.
My question is: How can I overlay the second set of data over the same axis using the same "count" histogram?
One method to do this is by simply adding another trace, you were nearly there! The dataset used to create these examples, can be found in the last section of this post.
Note:
The following code uses the 'lower-level' plotly API, as (personally) I feel it's more transparent and enables the user to see what is being plotted, and why; rather than relying on the convenience modules of graph_objects
and express
.
from plotly.offline import plot
layout = {}
traces = []
traces.append({'x': data1, 'name': 'D1', 'opacity': 1.0})
traces.append({'x': data2, 'name': 'D2', 'opacity': 0.5})
# For each trace, add elements which are common to both.
for t in traces:
t.update({'type': 'histogram',
'histfunc': 'count',
'nbinsx': 50})
layout['barmode'] = 'overlay'
plot({'data': traces, 'layout': layout})
Another option is to plot the curve (Gaussian KDE) of the distribution, as shown here. It's worth noting that this method plots the probability density, rather than the counts.
X1, Y1 = calc_curve(data1)
X2, Y2 = calc_curve(data2)
traces = []
traces.append({'x': X1, 'y': Y1, 'name': 'D1'})
traces.append({'x': X2, 'y': Y2, 'name': 'D2'})
plot({'data': traces})
Associated calc_curve()
function:
from scipy.stats import gaussian_kde
def calc_curve(data):
"""Calculate probability density."""
min_, max_ = data.min(), data.max()
X = [min_ + i * ((max_ - min_) / 500) for i in range(501)]
Y = gaussian_kde(data).evaluate(X)
return(X, Y)
Or, you can always combine the two methods together, using the probability density on the yaxis.
layout = {}
traces = []
traces.append({'x': data1, 'name': 'D1', 'opacity': 1.0})
traces.append({'x': data2, 'name': 'D2', 'opacity': 0.5})
for t in traces:
t.update({'type': 'histogram',
'histnorm': 'probability density',
'nbinsx': 50})
traces.append({'x': X1, 'y': Y1, 'name': 'D1'})
traces.append({'x': X2, 'y': Y2, 'name': 'D2'})
layout['barmode'] = 'overlay'
plot({'data': traces, 'layout': layout})
Here is the bit of code used to simulate your dataset of [0,100] values, and to create these examples:
import numpy as np
from sklearn.preprocessing import MinMaxScaler
mms = MinMaxScaler((0, 100))
np.random.seed(4)
data1 = mms.fit_transform(np.random.randn(10000).reshape(-1, 1)).ravel()
data2 = mms.fit_transform(np.random.randn(10000).reshape(-1, 1)).ravel()