Search code examples
pythonpandasdataframenormalizereindex

Normalizing a data set to look for seasonalities and average python pandas


Using a data set, I wanted to reindex the below around a center date using pandas. For instance, on the first data set I would like to generate an index around 12/22/2009 and give it the value 0 and anything before would be -1,-2,-3,-4 etc and after would be 1,2,3,4,5. Does anyone have any suggestions how to manage the reindexing dynamically? I anticipate having to merge the data sets together using df.merge(how = 'outer') and add an additional column to average the values.

Theoretical Data Set

Reindex to below

enter image description here

Merge and return centered around 0


Solution

  • Is it what you expect?

    Input data:

    >>> df1
             date     value
    0  2009-12-14  4.829653
    1  2009-12-15  4.941522
    2  2009-12-16  4.066554
    3  2009-12-17  4.374824
    4  2009-12-18  4.919813
    5  2009-12-21  4.774057
    6  2009-12-22  4.431368
    7  2009-12-23  4.794788
    8  2009-12-24  4.496606
    9  2009-12-25  4.408936
    10 2009-12-28  4.467351
    11 2009-12-29  4.834504
    
    >>> df2
             date     value
    0  2009-11-24  4.852626
    1  2009-11-25  4.785530
    2  2009-11-26  4.676981
    3  2009-11-27  4.030883
    4  2009-11-30  4.563335
    5  2009-12-01  4.226778
    6  2009-12-02  4.073694
    7  2009-12-03  4.715326
    8  2009-12-04  4.709705
    9  2009-12-07  4.469377
    10 2009-12-08  4.727273
    11 2009-12-09  4.923308
    12 2009-12-10  4.742808
    13 2009-12-11  4.168057
    
    >>> pivot_dates
    ['2009-12-21','2009-12-04']
    
    dfs = [df1[::-1].set_index(df1.index - df1['date'].eq(pivot_dates[0]).argmax() - 1)['value'],
           df2[::-1].set_index(df2.index - df2['date'].eq(pivot_dates[1]).argmax() - 1)['value']]
    
    df = pd.concat(dfs, axis='columns')
    df.columns = ['A', 'B']
    

    Output result:

    >>> df
               A         B
    -9       NaN  4.168057
    -8       NaN  4.742808
    -7       NaN  4.923308
    -6  4.834504  4.727273
    -5  4.467351  4.469377
    -4  4.408936  4.709705
    -3  4.496606  4.715326
    -2  4.794788  4.073694
    -1  4.431368  4.226778
    0   4.774057  4.563335
    1   4.919813  4.030883
    2   4.374824  4.676981
    3   4.066554  4.785530
    4   4.941522  4.852626
    5   4.829653       NaN