I want to annotate a plot of multivariate time-series with time intervals (in colour for each type of annotation).
An example dataset looks like this:
metrik_0 metrik_1 metrik_2 geospatial_id topology_id \
2020-01-01 -0.848009 1.305906 0.924208 12 4
2020-01-01 -0.516120 0.617011 0.623065 8 3
2020-01-01 0.762399 -0.359898 -0.905238 19 3
2020-01-01 0.708512 -1.502019 -2.677056 8 4
2020-01-01 0.249475 0.590983 -0.677694 11 3
cohort_id device_id
2020-01-01 1 1
2020-01-01 1 9
2020-01-01 2 13
2020-01-01 2 8
2020-01-01 1 12
The labels look like this:
cohort_id marker_type start end
0 1 a 2020-01-02 00:00:00 NaT
1 1 b 2020-01-04 05:00:00 2020-01-05 16:00:00
2 1 a 2020-01-06 00:00:00 NaT
a
(configured by the number of hours)I thought about using seaborn/matplotlib for this task.
So far I have come around:
%pylab inline
import seaborn as sns; sns.set()
import matplotlib.dates as mdates
aut_locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
aut_formatter = mdates.ConciseDateFormatter(aut_locator)
g = df[df['cohort_id'] == 1].plot(figsize=(8,8))
g.xaxis.set_major_locator(aut_locator)
g.xaxis.set_major_formatter(aut_formatter)
plt.show()
which is rather chaotic. I fear, it will not be possible to fit the metrics (multivariate data) into a single plot. It should be facetted by each column. However, this again would require to reshape the dataframe for seaborn FacetGrid to work, which also doesn`t quite feel right - especially if the number of elements (time-series) in a cohort_id gets larger. If FacetGrid is the right way, then something along the lines of: https://seaborn.pydata.org/examples/timeseries_facets.html would be the first part, but the labels would still be missing.
How could the labels be added? How should the first part be accomplished?
An example of the desired result:
https://i.sstatic.net/JYilG.jpg, i.e. one of
for each metric value
The datasets are generated from the code snippet below:
import pandas as pd
import numpy as np
import random
random_seed = 47
np.random.seed(random_seed)
random.seed(random_seed)
def generate_df_for_device(n_observations, n_metrics, device_id, geo_id, topology_id, cohort_id):
df = pd.DataFrame(np.random.randn(n_observations,n_metrics), index=pd.date_range('2020', freq='H', periods=n_observations))
df.columns = [f'metrik_{c}' for c in df.columns]
df['geospatial_id'] = geo_id
df['topology_id'] = topology_id
df['cohort_id'] = cohort_id
df['device_id'] = device_id
return df
def generate_multi_device(n_observations, n_metrics, n_devices, cohort_levels, topo_levels):
results = []
for i in range(1, n_devices +1):
#print(i)
r = random.randrange(1, n_devices)
cohort = random.randrange(1, cohort_levels)
topo = random.randrange(1, topo_levels)
df_single_dvice = generate_df_for_device(n_observations, n_metrics, i, r, topo, cohort)
results.append(df_single_dvice)
#print(r)
return pd.concat(results)
# hourly data, 1 week of data
n_observations = 7 * 24
n_metrics = 3
n_devices = 20
cohort_levels = 3
topo_levels = 5
df = generate_multi_device(n_observations, n_metrics, n_devices, cohort_levels, topo_levels)
df = df.sort_index()
df.head()
marker_labels = pd.DataFrame({'cohort_id':[1,1, 1], 'marker_type':['a', 'b', 'a'], 'start':['2020-01-2', '2020-01-04 05', '2020-01-06'], 'end':[np.nan, '2020-01-05 16', np.nan]})
marker_labels['start'] = pd.to_datetime(marker_labels['start'])
marker_labels['end'] = pd.to_datetime(marker_labels['end'])
In general, you can use either plt.fill_between
for horizontal and plt.fill_betweenx
for vertical bands. For "bands-within-bands" you can just call the method twice.
A basic example using your data would look like this. I've used fixed values for the position of the bands, but you can put them on the main dataframe and reference them dynamically inside the loop.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(3 ,figsize=(20, 9), sharex=True)
plt.subplots_adjust(hspace=0.2)
metriks = ["metrik_0", "metrik_1", "metrik_2"]
colors = ['#66c2a5', '#fc8d62', '#8da0cb'] #Set2 palette hexes
for i, metric in enumerate(metriks):
df[[metric]].plot(ax=ax[i], color=colors[i], legend=None)
ax[i].set_ylabel(metric)
ax[i].fill_betweenx(y=[-3, 3], x1="2020-01-04 05:00:00",
x2="2020-01-05 16:00:00", color='gray', alpha=0.2)
ax[i].fill_betweenx(y=[-3, 3], x1="2020-01-04 15:00:00",
x2="2020-01-05 00:00:00", color='gray', alpha=0.4)