Search code examples
pythonmatplotlibplotlyseabornbokeh

Data Availability Chart in Python


I am wondering if Python has something to plot the data availability of time series with multiple variables. An example is shown below taken from Visavail.js - A Time Data Availability Chart.


Solution

  • Here's a suggestion using plotly in a Jupyter Notebook:

    enter image description here

    Code:

    import random
    import pandas as pd
    import plotly.express as px
    from random import choices
    
    # random data with a somewhat higher
    # probability of 1 than 0 to mimic OPs data
    random.seed(1)
    vals=[0,1]
    prob=[0.4, 0.6]
    choices(vals, prob)
    
    data=[]
    for i in range(0,5):
        data.append([choices(vals, prob)[0] for c in range(0,10)])
    
    # organize data in a pandas dataframe
    df=pd.DataFrame(data).T
    df.columns=['Balance Sheet', 'Closing Price', 'Weekly Report', 'Analyst Data', 'Annual Report']
    drng=pd.date_range(pd.datetime(2080, 1, 1).strftime('%Y-%m-%d'), periods=df.shape[0]).tolist()
    df['date']=[d.strftime('%Y-%m-%d') for d in drng]
    dfm=pd.melt(df, id_vars=['date'], value_vars=df.columns[:-1])
    
    # plotly express
    fig = px.bar(dfm, x="date", y="variable", color='value', orientation='h',
                 hover_data=["date"],
                 height=600,
                 color_continuous_scale=['firebrick', '#2ca02c'],
                 title='Data Availabiltiy Plot',
                 template='plotly_white',
                )
    
    fig.update_layout(yaxis=dict(title=''), xaxis=dict(title='', showgrid=False, gridcolor='grey',
                      tickvals=[],
                                )
                     )
    fig.show()