I'm trying to calculate the correlation between two lists every 30 days using the pearsonr function from scipy.
One list consists of dates (called dateValues), and the other one consists of sales (called saleNumbers). I already extracted the dates using datetime.strptime earlier and if I print out dateValues, I get a range of dates with an arbitrary length.
datetime.datetime(2016, 8, 12, 0, 0), datetime.datetime(2016, 8, 11, 0, 0), datetime.datetime(2016, 8, 10, 0, 0)...etc
While here is the sales list:
saleNumbers = [3567,2348,1234,....etc]
However when I do
pearsonr(dateValues,saleNumbers)
I get the error
TypeError: unsupported operand type(s) for +: 'datetime.datetime' and 'datetime.datetime'
After searching endlessly, I found that one can use datetime.date to do arithmetic between dates.
So i did this:
print(datetime.date(dateValues[0]) - datetime.date(dateValues[29]))
And sure enough that gives me 30 days for the time difference.
So I then tried this:
pearsonr(datetime.date(dateValues[0]) - datetime.date(dateValues[29]),saleNumbers)
But I then get this error
TypeError: len() of unsized object
Any ideas on how I can move forward with this? Also I don't think datetime.date(dateValues[0]) - datetime.date(dateValues[2]) is the correct Pythonic way to handle the dates list when finding the correlation.
PS: In this image, is an Excel spreadsheet showing what I've already done, but trying to replicate here in Python: https://i.sstatic.net/THUoX.jpg
Convert them to numeric values first:
arbitrary_date = datetime(1970,1,1)
pearsonr([(d - arbitrary_date).total_seconds() for d in dateValues], saleNumbers)
Perason correlation is unaffected by scaling or translation in either axis (affine transformations)