I'm trying to get a correlation value for a previous week's inputs to the following week's output.
For the sake of this example I've set it up where each week's input will be the following week's Output, and the df.corr()
should give a 1.000000
result.
My original data looks like this:
Date Input Output
1/1/2010 73 73
1/7/2010 2 73
1/13/2010 3 2
1/19/2010 4 3
Full sample data uploaded here: https://drive.google.com/open?id=0B4xdnV0LFZI1MzRUOUJkcUY4ajQ
Here's my code so far:
import pandas as pd
df = pd.read_csv('pearson.csv')
df['Date'] = pd.to_datetime(df['Date'], errors = 'coerce')
df = df.set_index(pd.DatetimeIndex(df['Date']))
df = df[['Input', 'Output']]
x = df.corr(method = 'pearson', min_periods=1)
print(x)
And as a newbie here's where I'm stuck. I don't see a shift
option built in the function and not sure how to do this.
Any and all help is appreciated.
Thank you, Me
If you do .corr
on a dataframe, it will produce a correlation matrix.
In your case, you just want the correlation between two time series and you can achieve this with the code below. Note that the .corr
method for a time series requires the parameter other
, which is the series to compute the correlation with.
df["Input"].corr(df["Output"].shift(-1), method = 'pearson', min_periods = 1) #1
If instead you want the correlation matrix, you should first create a dataframe with shifted Output and then compute the correlation:
temp_df = pd.concat([df['Input'], df['Output'].shift(-1)], axis = 1).dropna()
temp_df.corr(method = 'pearson', min_periods = 1)
# Input Output
#Input 1.0 1.0
#Output 1.0 1.0