Financial time series are often fraught with missing data. And out of the box, plotly handles a series with missing timestamps visually by just displaying a line like below. But the challenge here is that plotly interprets the timestamps as a value, and inserts all missing dates in the figure.
Most of the time, I find that the plot would look better by just completely leaving those dates out. An example from the plotly docs under https://plotly.com/python/time-series/#hiding-weekends-and-holidays shows how to handle missing dates for some date categories like weekends or holidays using:
fig.update_xaxes(
rangebreaks=[
dict(bounds=["sat", "mon"]), #hide weekends
dict(values=["2015-12-25", "2016-01-01"]) # hide Christmas and New Year's
]
)
The downside here is that your dataset may just as well be missing some data for any other weekday. And of course you would have to specify given dates for holidays for different countries, so are there any other approaches?
import pandas as pd
import numpy as np
import plotly.graph_objects as go
# data
np.random.seed(1234)
n_obs = 15
frequency = 'D'
daterange = pd.date_range('2020', freq=frequency, periods=n_obs)
values = np.random.randint(low=-5, high=6, size=n_obs).tolist()
df = pd.DataFrame({'time':daterange, 'value':values})
df = df.set_index('time')
df.iloc[0]=100; df['value']=df.value.cumsum()
# Missing timestamps
df.iloc[2:5] = np.nan; df.iloc[8:13] = np.nan
df.dropna(inplace = True)
# plotly figure
fig=go.Figure(go.Scatter(x=df.index, y =df['value']))
fig.update_layout(template = 'plotly_dark')
fig.show()
They key here is still to use the rangebreak
attribute. But if you were to follow the approach explained in the linked example, you'd have to include each missing date manually. But the solution to missing data in this case is actually more missing data. And this is why:
1. You can retrieve the timestamps
from the beginning and the end of your series, and then
2. build a complete timeline
within that period (with possibly more missing dates) using:
dt_all = pd.date_range(start=df.index[0],
end=df.index[-1],
freq = 'D')
3. Next you can isolate the timestamps
you do have in df.index
that are not in that timeline using:
dt_breaks = [d for d in dt_all_py if d not in dt_obs_py]
4. And finally you can include those timestamps in rangebreaks
like so:
fig.update_xaxes(
rangebreaks=[dict(values=dt_breaks)]
)
import pandas as pd
import numpy as np
import plotly.graph_objects as go
# data
np.random.seed(1234)
n_obs = 15
frequency = 'D'
daterange = pd.date_range('2020', freq=frequency, periods=n_obs)
values = np.random.randint(low=-5, high=6, size=n_obs).tolist()
df = pd.DataFrame({'time':daterange, 'value':values})
df = df.set_index('time')
df.iloc[0]=100; df['value']=df.value.cumsum()
# Missing timestamps
df.iloc[2:5] = np.nan; df.iloc[8:13] = np.nan
df.dropna(inplace = True)
# plotly figure
fig=go.Figure(go.Scatter(x=df.index, y =df['value']))
fig.update_layout(template = 'plotly_dark')
# complete timeline between first and last timestamps
dt_all = pd.date_range(start=df.index[0],
end=df.index[-1],
freq = frequency)
# make sure input and synthetic time series are of the same types
dt_all_py = [d.to_pydatetime() for d in dt_all]
dt_obs_py = [d.to_pydatetime() for d in df.index]
# find which timestamps are missing in the complete timeline
dt_breaks = [d for d in dt_all_py if d not in dt_obs_py]
# remove missing timestamps from visualization
fig.update_xaxes(
rangebreaks=[dict(values=dt_breaks)] # hide timestamps with no values
)
#fig.update_layout(title=dict(text="Some dates are missing, but still displayed"))
fig.update_layout(title=dict(text="Missing dates are excluded by rangebreaks"))
fig.update_xaxes(showgrid=False)
fig.show()