Search code examples
pythonplotlydata-visualizationplotly-python

Ignore Non Trading days (Holidays / remove gaps) in Plotly candlesticks for "Minutes / Hours) data


This Answer says xaxis=dict(type = "category") but I don't know where to use that argument (fresh from matplotlib only for candlesticks)

Following some Links, I found that running the below code on Day data, I can easily remove gaps:


 dt_all = pd.date_range(start=stocks.iloc[0,0],end=stocks.iloc[-1,0], freq = f'{freq}min')
            dt_obs = [d.strftime("%Y-%m-%d %H:%M:%S") for d in pd.to_datetime(stocks.DATE)]
            dt_breaks = [d for d in dt_all.strftime("%Y-%m-%d %H:%M:%S").tolist() if not d in dt_obs]
            
            range_selector = dict(buttons = list([
                    dict(count = 5, label = '5Min', step = 'minute', stepmode = 'backward'),
                    dict(count = 15, label = '15Min', step = 'minute', stepmode = 'backward'),
                    dict(count = 75, label = '75M', step = 'minute', stepmode = 'backward'),
                    dict(count = 1, label = '1D', step = 'day', stepmode = 'backward'),
                    dict(step = 'all')]))

candle = go.Figure(data = [go.Candlestick(opacity = 0.9,x = stocks['Date'], name = 'X',
                                                       open = stocks['Open'], 
                                                       high = stocks['High'], 
                                                       low = stocks['Low'], 
                                                       close = stocks['Close']),])

candle.update_xaxes(
            title_text = 'Date',
            rangeslider_visible = True, 
        rangebreaks=[dict(values=dt_breaks)], 
range_selector = range_selector)

But I have 5 Minute Data as:

    DATE    OPEN    HIGH    LOW CLOSE   52W H   52W L   SYMBOL
374 2022-01-14 15:25:00+05:30   720.25  722.35  720.25  721.55  NaN NaN BHARTIARTL
373 2022-01-14 15:20:00+05:30   720.30  720.45  719.45  720.25  NaN NaN BHARTIARTL
372 2022-01-14 15:15:00+05:30   720.75  720.90  720.15  720.30  NaN NaN BHARTIARTL
371 2022-01-14 15:10:00+05:30   720.35  720.90  720.20  720.70  NaN NaN BHARTIARTL
370 2022-01-14 15:05:00+05:30   720.70  720.90  720.05  720.20  NaN NaN BHARTIARTL
... ... ... ... ... ... ... ... ...
4   2022-01-10 09:35:00+05:30   706.05  707.15  705.65  706.55  NaN NaN BHARTIARTL
3   2022-01-10 09:30:00+05:30   705.90  706.40  705.05  706.05  NaN NaN BHARTIARTL
2   2022-01-10 09:25:00+05:30   707.10  707.95  705.60  705.60  NaN NaN BHARTIARTL
1   2022-01-10 09:20:00+05:30   709.00  709.40  706.15  707.10  NaN NaN BHARTIARTL
0   2022-01-10 09:15:00+05:30   705.40  709.00  705.40  708.55  NaN NaN BHARTIARTL

Using the above code gives me results as:

enter image description here What can be done in this case?


Solution

  • There are other answers to Plotly: How to remove empty dates from x axis that will suit your use-case better. This can be a bit tricky for 5 minute intervals. Just consider the formatting of your timestamps, and follow these steps carefully:

    • find all time intervals starting from your first observation to your last
    • find out which of your observations that occur in that complete timeline
    • isolate the rest of the dates and include them in the rangebreaks attribute of the x-axis
    • adjust the dvalues attribute of rangebreaks to match your time interval in milliseconds with fig.update_xaxes(rangebreaks=[dict(dvalue = 5*60*1000, values=dt_breaks)])

    Essential code elements:

    # grab first and last observations from df.date and make a continuous date range from that
    dt_all = pd.date_range(start=df['Date'].iloc[0],end=df['Date'].iloc[-1], freq = '5min')
    
    # check which dates from your source that also accur in the continuous date range
    dt_obs = [d.strftime("%Y-%m-%d %H:%M:%S") for d in df['Date']]
    
    # isolate missing timestamps
    dt_breaks = [d for d in dt_all.strftime("%Y-%m-%d %H:%M:%S").tolist() if not d in dt_obs]
    
    # adjust xaxis for rangebreaks
    fig.update_xaxes(rangebreaks=[dict(dvalue = 5*60*1000, values=dt_breaks)])
    

    Plot 1: Missing timestamps are showing

    enter image description here

    Plot 2: Missing timestamps are not showing

    enter image description here

    Complete code:

    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    import pandas as pd
    import numpy as np
    
    # sample data
    df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv').tail(90)
    df = df[df.columns[:6]]
    df['Date'] = pd.date_range("2018-01-01", periods=len(df), freq="5min")
    df.columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume']
    df = df.tail(10)
    
    # remove some data
    np.random.seed(0)
    remove_n = 4
    drop_indices = np.random.choice(df.index, remove_n, replace=False)
    df = df.drop(drop_indices)
    
    # plotly candlestick figure
    fig = go.Figure(data=[go.Candlestick(
        x=df['Date'],
        open=df['Open'], high=df['High'],
        low=df['Low'], close=df['Close'],
    )])
    
    # grab first and last observations from df.date and make a continuous date range from that
    dt_all = pd.date_range(start=df['Date'].iloc[0],end=df['Date'].iloc[-1], freq = '5min')
    
    # check which dates from your source that also accur in the continuous date range
    dt_obs = [d.strftime("%Y-%m-%d %H:%M:%S") for d in df['Date']]
    
    # isolate missing timestamps
    dt_breaks = [d for d in dt_all.strftime("%Y-%m-%d %H:%M:%S").tolist() if not d in dt_obs]
    dt_breaks = pd.to_datetime(dt_breaks)
    
    fig.show()
    fig.update_xaxes(rangebreaks=[dict(dvalue = 5*60*1000, values=dt_breaks)] )
    print(fig.layout.xaxis.rangebreaks)
    fig.show()