Search code examples
pythonstack

Timeseries stacking in python


Hi I want to stack time series per a year.

https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2021JB022650

This is the paper I read and at fig 5, they did annual stack (about the fig 5, they referred as "Each subplot of Figure 5 includes the annual stacks of normalized data"). enter image description here

I have timeseries as below for two years and want to do the job in python.

2011-01-01 0.034
2011-01-02 -0.234
...
2012-12-30 0.363
2012-12-31 0.092

So I think I have to divide the timeseries from 2011 year and 2012 year and stack the two timeseries. However, I could not figure out the way to stack timeseries.

What code I have to use for stacking annually?


Solution

  • You want to stack timeseries data by year for a given number of years. To stack your data together, you can use matplotlib and repeatedly plot each year of data onto a particular plot/subplot.

    To stack annual data together, there's also the question of how to treat for leap days. The following code treats leap day Feb 29 as a necessary value to appear on the x-axis, so non-leap years are treated as not having a datapoint on that day.

    I've also tried approximating the awesome layout of the graphs shown in your picture.

    import matplotlib.pyplot as plt
    import datetime
    from calendar import isleap
    import random
    
    # Get day number (counted from start of year) for any datetime obj.
    #   Day numbers go all the way to 366 per year (counts leap day Feb 29).
    def daysFromYearStart(dt):
        td = dt - datetime.datetime(dt.year,1,1)
        return td.days+2 if not isleap(dt.year) and td.days > 58 else td.days+1
    
    t1 = datetime.datetime(2000,1,1)
    t2 = datetime.datetime(2005,12,31)
    tdelta = t2 - t1
    # Days from t1 to t2 as datetime objs.
    dates = [datetime.datetime(t1.year, 1, 1) + datetime.timedelta(days=k) for k in range(tdelta.days + 1)]
    # Integer day numbers to plot as x-values.
    x = list(map(daysFromYearStart, dates))
    # Index positions of year starts + year end.
    idx = [i for i,v in enumerate(x) if v==1] + [len(dates)]
    
    # Random numeric y values.
    y = list(map(lambda x: x+400*random.random(), range(tdelta.days + 1)))
    
    fig, ax = plt.subplots(1,1)
    color_cycler = ['green','blue','red','orange','purple','brown']
    # This stacks lines together on each plot.
    for k in range(len(idx) - 1):
        ax.plot(x[idx[k]:idx[k+1]], y[idx[k]:idx[k+1]], color=color_cycler[k], marker='_')
    
    # Add a legend outside of the plot.
    ax.legend([f'Year {k}' for k in range(t1.year,t2.year + 1)], bbox_to_anchor=(1.02, 1), loc='upper left')
    # Set title and axis labels.
    ax.set_title('Stacked Timeseries Data')
    ax.set_xlabel('Months')
    ax.set_ylabel('Data to be normalized')
    # Set grid lines.
    ax.grid(visible=True, which='major', axis='both', alpha=0.5)
    # Set x-axis major and minor ticks and labels.
    ax.set_xticks([1, 92, 183, 275], labels=['Jan','Apr','Jul','Oct'])
    ax.set_xticks([1, 32, 61, 92, 122, 153, 183, 214, 245, 275, 306, 336, 366], minor=True)
    # Set ticks to also display on the top and right sides of plot.
    ax.xaxis.set_ticks_position('both')
    ax.yaxis.set_ticks_position('both')
    # Set ticks to face inward in plot.
    ax.tick_params(axis='both', direction='in', length=10)
    ax.tick_params(axis='both', which='minor', direction='in', length=5)
    # Rotate xlabels.
    ax.set_xticklabels(ax.get_xticklabels(), rotation=30, ha="left")
    # Display properly and show plot.
    fig.tight_layout()
    plt.show()
    

    Here's the output:

    Stacked timeseries