Search code examples
pythonpython-3.xpandaspearson-correlation

How to Offset Pandas Pearson Correlation with Datetime Index


I'm trying to get a correlation value for a previous week's inputs to the following week's output.

For the sake of this example I've set it up where each week's input will be the following week's Output, and the df.corr() should give a 1.000000 result.

My original data looks like this:

Date      Input     Output
1/1/2010    73         73
1/7/2010     2         73
1/13/2010    3          2
1/19/2010    4          3

Full sample data uploaded here: https://drive.google.com/open?id=0B4xdnV0LFZI1MzRUOUJkcUY4ajQ

Here's my code so far:

import pandas as pd
df = pd.read_csv('pearson.csv')
df['Date'] = pd.to_datetime(df['Date'], errors = 'coerce')
df = df.set_index(pd.DatetimeIndex(df['Date']))
df = df[['Input', 'Output']]
x = df.corr(method = 'pearson', min_periods=1)
print(x)

And as a newbie here's where I'm stuck. I don't see a shift option built in the function and not sure how to do this.

Any and all help is appreciated.

Thank you, Me


Solution

  • If you do .corron a dataframe, it will produce a correlation matrix.

    In your case, you just want the correlation between two time series and you can achieve this with the code below. Note that the .corr method for a time series requires the parameter other, which is the series to compute the correlation with.

    df["Input"].corr(df["Output"].shift(-1), method = 'pearson', min_periods = 1) #1
    

    If instead you want the correlation matrix, you should first create a dataframe with shifted Output and then compute the correlation:

    temp_df = pd.concat([df['Input'], df['Output'].shift(-1)], axis = 1).dropna()
    temp_df.corr(method = 'pearson', min_periods = 1)   
    
    #        Input  Output
    #Input     1.0     1.0
    #Output    1.0     1.0