Search code examples
pythonpandasdateplotlylinechart

Create a line as a secondary axis with Plotly


I have the following dataframe:

         date type   value
1  2020-01-01    N    7956
2  2020-01-01    R   55709
3  2020-02-01    N    2513
4  2020-02-01    R   62325
5  2020-03-01    N    1419
6  2020-03-01    R   63745
7  2020-04-01    N     350
8  2020-04-01    R   65164
9  2020-05-01    N   11500
10 2020-05-01    R   65050
11 2020-06-01    N    7208
12 2020-06-01    R   74550
13 2020-07-01    N    2904
14 2020-07-01    R   81158
15 2020-08-01    N   11054
16 2020-08-01    R   80841
17 2020-09-01    N    7020
18 2020-09-01    R   91445
19 2020-10-01    N   25448
20 2020-10-01    R   97776
21 2020-11-01    N    8497
22 2020-11-01    R  122479
23 2020-12-01    N   11154
24 2020-12-01    R  129813

I'm building the visualization below with this dataframe. My code so far is:

fig = px.bar(df_vintage, x='date', y='value', color='type',  
             labels={'type': 'value'}, category_orders={"type": ["R", "N"]})

fig.update_layout(
    title_text='Vintage Analysis',
    template='seaborn',
    margin=dict(l=50, r=50, t=50, b=50),
    legend=dict(yanchor="top", y=0.98, xanchor="left", x=0.02),
    
)

fig.update_xaxes(
    dtick="M1",
    tickformat="%b\n%Y")

fig.show()

enter image description here

I'm trying to insert a line visualization, with the percentage change of the total value, from month-to-month (but without the using the make_subplots from Plotly). To create the vector from the percentage change:

pct_change = df_vintage.groupby('dates').sum().pct_change().reset_index()

I'm referring to the blue line I draw in the visualization. Any ideas? Thanks in advance.


Solution

  • If you have to avoid using make_subplots then you can draw the line using the shapes module and iterating through your pct_change DataFrame to draw lines between the i and i+1 element.

    Unfortunately there isn't a second y-axis so the best I could was set the reference of the y-axis to "paper" which will draw the line as if the y-axis goes from 0 to 1 (e.g. a value of 0.5 in pct_change will appear halfway up the y-axis).

    Since most of your percent changes are low, I added a constant value base_pct_value (which I set to 0.5) to the percent changes so they display above the bars while still showing the general trend in percent changes (I am assuming the trend of the line matters more to you than exactly where they are positioned on the y-axis)

    import pandas as pd
    import plotly.express as px
    import io
    
    df_vintage = pd.read_csv(
        io.StringIO(
            """date type   value
    1  2020-01-01    N    7956
    2  2020-01-01    R   55709
    3  2020-02-01    N    2513
    4  2020-02-01    R   62325
    5  2020-03-01    N    1419
    6  2020-03-01    R   63745
    7  2020-04-01    N     350
    8  2020-04-01    R   65164
    9  2020-05-01    N   11500
    10 2020-05-01    R   65050
    11 2020-06-01    N    7208
    12 2020-06-01    R   74550
    13 2020-07-01    N    2904
    14 2020-07-01    R   81158
    15 2020-08-01    N   11054
    16 2020-08-01    R   80841
    17 2020-09-01    N    7020
    18 2020-09-01    R   91445
    19 2020-10-01    N   25448
    20 2020-10-01    R   97776
    21 2020-11-01    N    8497
    22 2020-11-01    R  122479
    23 2020-12-01    N   11154
    24 2020-12-01    R  129813"""
        ),
        sep="\s+",
    )
    
    fig = px.bar(df_vintage, x='date', y='value', color='type',  
                 labels={'type': 'value'}, category_orders={"type": ["R", "N"]})
    
    fig.update_layout(
        title_text='Vintage Analysis',
        template='seaborn',
        margin=dict(l=50, r=50, t=50, b=50),
        legend=dict(yanchor="top", y=0.98, xanchor="left", x=0.02),
        
    )
    
    fig.update_xaxes(
        dtick="M1",
        tickformat="%b\n%Y")
    
    pct_change = df_vintage.groupby('date').sum().pct_change().reset_index()
    
    ## however far up on the y-axis you want the pct_change values to be drawn from, some value between 0 and 1
    base_pct_value = 0.5
    
    for i in range(1,len(pct_change)-1):
        fig.add_shape(type="line",
            x0=pct_change.iloc[i]['date'], 
            y0=pct_change.iloc[i]['value'] + base_pct_value, 
            x1=pct_change.iloc[i+1]['date'], 
            y1=pct_change.iloc[i+1]['value'] + base_pct_value,
            yref="paper",
            line=dict(
                color="skyblue",
                width=10,
            )
        )
        fig.add_annotation(
            x=pct_change.iloc[i]['date'], 
            y=pct_change.iloc[i]['value'] + base_pct_value,
            text=f"{pct_change.iloc[i]['value']:.2f}",
            showarrow=False,
            yref="paper",
            yshift=10
        )
        ## last iteration of loop
        if i+1 == len(pct_change)-1:
            fig.add_annotation(
            x=pct_change.iloc[i+1]['date'], 
            y=pct_change.iloc[i+1]['value'] + base_pct_value,
            text=f"{pct_change.iloc[i+1]['value']:.2f}",
            showarrow=False,
            yref="paper",
            yshift=10
        )
    
    fig.show()
    

    enter image description here