Search code examples
pythongraphplotlysimulationdistribution

How to get distribution on side of graph Plotly, Python?


so I am graphing some simulations of time series data where I want to be able to visualize the distribution of the ending values in Python. Given the plot of simulated paths, I want to be able to take the values at the end (time = T) and use a distribution graph (just line, not box) that shows up on the right side of the graph showing the skew. I've provided an image example here:enter image description here

I'm using plotly to generate the graph as I want to be able to have hover data for each point in time. The data I'm using is generic, so any form of simulated path over any time horizon works. My current graph code goes as follows:

import plotly.express as px
import plotly.graph_objects as go
from plotly.graph_objs.scatter.marker import Line

fig = go.Figure(go.Scatter(
x=y_testing.index,
y=y_testing.iloc[:, 0]
))

for i in range(1, len(y_testing.columns)-1):
    fig.add_trace(go.Scatter(
        x=y_testing.index,
        y=y_testing.iloc[:, i]
    ))

fig.show()
  • y_testing.index is the time period
  • y_testing.iloc[:, 0] is the actual data
  • y_testing.iloc[:, i] is each respective simulated path

Solution

  • You can use subplots and place a one sided violin plot next to your go.Scatter traces. To make the figure easier to read, I added a secondary yaxis so that the yaxis tick marks from the violin plot aren't between the scatter and violin plot, and I reduced the spacing between the two plots.

    One useful tip is that in plotly, the default yaxis range is [min-range/16, max+range/16], which I used to manually set the range of the violin plot so it matches the scatter.

    import plotly.express as px
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    
    from plotly.graph_objs.scatter.marker import Line
    import numpy as np
    
    ## use a sample timeseries data set
    df = px.data.stocks()
    
    y_testing = df.set_index('date')
    y_testing.drop(columns=['AAPL', 'AMZN', 'FB', 'NFLX', 'MSFT'], inplace=True)
    
    np.random.seed(42)
    for i in range(20):
        y_testing[f'GOOG_{i}'] = y_testing['GOOG'] + np.random.normal(loc=0.0, scale=0.05, size=len(y_testing))
    
    
    fig = make_subplots(rows=1, cols=2, column_widths=[0.8, 0.2], horizontal_spacing=0.01, specs=[[{"secondary_y": False}, {"secondary_y": True}]])
    
    fig.add_trace(go.Scatter(
        x=y_testing.index,
        y=y_testing.iloc[:, 0]
        ),row=1, col=1
    )
    
    for i in range(1, len(y_testing.columns)-1):
        fig.add_trace(go.Scatter(
            x=y_testing.index,
            y=y_testing.iloc[:, i]
        ),row=1, col=1
    )
    
    fig.add_trace(go.Violin(
            y=y_testing.iloc[-1],
            side='positive'
        ),row=1, col=2, secondary_y=True,
    )
    
    ## determine yaxis range for the scatter
    y_testing_min = y_testing.min().min()
    y_testing_max = y_testing.max().max()
    y_testing_range = y_testing_max - y_testing_min
    y_range = [y_testing_min - y_testing_range/16, y_testing_max + y_testing_range/16]
    fig.update_yaxes(range=y_range, secondary_y=True)
    
    fig.show()
    

    enter image description here