Search code examples
pythontime-seriescorrelationp-valuesignificance

p-value for time series per item of matrix (3-dim)


I would like to correlate time series (540 months long) per/for each single grid box, looping over 2 latitudes x 6 longitudes = 12 grid boxes in total. I manage to get the correlation coefficient, but the p-value turns out to be "nan" even though they should not.

Info on my data:

  • TheData1 - Shape: (540, 2, 6) Dimension: 3
  • TheData2 - Shape: (540, 2, 6) Dimension: 3

I tried:

  1. np.corrcoef(TheData1[gotdata,lt,ln],TheData2[gotdata,lt,ln])[0,1] works fine to get the correlation coefficient, but not for the p-value.
  2. slope,intercept,r_value,p_value,std_err=scipy.stats.linregress(TheData1[gotdata,lt,ln],TheData2[gotdata,lt,ln]) gives pvalue=nan.
  3. scipy.stats.pearsonr(TheData1[gotdata,lt,ln],TheData2[gotdata,lt,ln]) gives the error

"ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"

TheMDI stands for -1e30, missing data.

Code:

for lt in range(NLats): # range nlats goes through 0...36
    for ln in range(NLons): # range nlons goes through 0...72

# Short way
        # create a map pointing to the elements of the array that are non-missing
        gotdata = np.where((TheData1[:,lt,ln] > TheMDI) & (TheData2[:,lt,ln] > TheMDI))

        OutputData[lt,ln] = np.corrcoef(TheData1[gotdata,lt,ln]/np.std(TheData1[gotdata,lt,ln]),TheData2[gotdata,lt,ln]/np.std(TheData2[gotdata,lt,ln]))[0,1]
    OutputDataSig[lt,ln]=scipy.stats.linregress(TheData1[gotdata,lt,ln],TheData2[gotdata,lt,ln])[3]

Solution

  • Solved by choosing the very first entry ([0]) of gotdata only:

    gotdata = np.where((TheData1[:,lt,ln] > TheMDI) & (TheData2[:,lt,ln] > TheMDI))[0]
    

    This makes scipy.stats.pearsonr giving out a p-value.