Search code examples
pythonplotlyhistogramvisualizationplotly-express

How to add a box plot and a vertical line in a histogram diagram in python Plotly Express graph objects subplots


Below is the data that is used to create the histogram subplot charts in ploty express graph objects.

enter image description here

Below code is used to create histogram subplot charts in ploty express graph objects.

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

specs = [[{'type':'histogram'}, {'type':'histogram'},{'type':'histogram'}]]
fig = make_subplots(rows=1, cols=3, specs=specs, subplot_titles=['<b> Millenials </b>',
                                                                 '<b> Generation X </b>',
                                                                 '<b> Boomers </b>'])


fig.add_trace(go.Histogram(             
              x=df[df['Generation']=='Millenials']['NumCompaniesWorked'],
              
              opacity = 0.5,
              marker_color = ['#455f66'] * 15
              ),1,1)


fig.add_trace(go.Histogram(             
              x=df[df['Generation']=='Generation X']['NumCompaniesWorked'],
              
              opacity = 0.5,
              marker_color = ['#455f66'] * 15
              ),1,2)


fig.add_trace(go.Histogram(             
              x=df[df['Generation']=='Boomers']['NumCompaniesWorked'],
              
              opacity = 0.5,
              marker_color = ['#455f66'] * 15
              ),1,3)

fig.update_layout(
                  showlegend=False, 
                  title=dict(text="<b> Histogram - <br> <span style='color: #f55142'> How to add the box plot and mean vertical line on each diagram </span></b> ",
                             font=dict(
                                        family="Arial",
                                        size=20,
                                        color='#283747')
                    ))  
fig.show()

And below is the output I get from the above code enter image description here

How can I include the mean (Average) vertical line in a histogram diagrams as the mean values are,

  • Millenials = 2.2
  • Generation X = 3.4
  • Boomers = 4.1

and a box plot above all 03 histogram diagrams.

Which should look like the shown diagram below for all 03 histogram diagrams.

enter image description here


Solution

  • import pandas as pd
    import numpy as np
    
    #original df
    df = pd.DataFrame({'NumCompaniesWorked':list(range(10)),
           'Millenials':[139,407,54,57,55,32,35,28,17,24],
           'Generation X':[53,108,83,90,70,27,32,40,26,24],
           'Boomers':[5,6,9,12,14,4,3,6,6,4]})
    
    #reorganizing df
    dfs = []
    for col in ['Millenials', 'Generation X', 'Boomers']:
        dfs.append(df[['NumCompaniesWorked', col]].rename(columns={col:'count'}).assign(Generation=col))
    df = pd.concat(dfs)
    
    #output
       NumCompaniesWorked  count    Generation
    0                   0    139    Millenials
    1                   1    407    Millenials
    2                   2     54    Millenials
    3                   3     57    Millenials
    4                   4     55    Millenials
    5                   5     32    Millenials
    6                   6     35    Millenials
    7                   7     28    Millenials
    8                   8     17    Millenials
    9                   9     24    Millenials
    0                   0     53  Generation X
    1                   1    108  Generation X
    2                   2     83  Generation X
    3                   3     90  Generation X
    4                   4     70  Generation X
    5                   5     27  Generation X
    6                   6     32  Generation X
    7                   7     40  Generation X
    8                   8     26  Generation X
    9                   9     24  Generation X
    0                   0      5       Boomers
    1                   1      6       Boomers
    2                   2      9       Boomers
    3                   3     12       Boomers
    4                   4     14       Boomers
    5                   5      4       Boomers
    6                   6      3       Boomers
    7                   7      6       Boomers
    8                   8      6       Boomers
    9                   9      4       Boomers
    
    
    fig = px.histogram(df,
           x='NumCompaniesWorked',
           y='count',
           marginal='box',
           facet_col='Generation')
    
    fig.add_vline(x=2.2, line_width=1, line_dash='dash', line_color='gray', col=1)
    fig.add_vline(x=3.4, line_width=1, line_dash='dash', line_color='gray', col=2)
    fig.add_vline(x=4.1, line_width=1, line_dash='dash', line_color='gray', col=3)
    
    fig.show()
    

    Figure