Search code examples
pythonpandasdataframerolling-computationpairwise

Pandas pairwise arithmetic similar to rolling().corr()


I have a dataframe as follows:

fsym                            EOS       BTC       BNB
time                                                   
2018-11-30 00:00:00+00:00 -0.051903 -0.069088 -0.058162
2018-12-01 00:00:00+00:00  0.026936  0.044739  0.040303
2018-12-02 00:00:00+00:00 -0.034843 -0.012935 -0.005900
2018-12-03 00:00:00+00:00 -0.108108 -0.070375 -0.028180
2018-12-04 00:00:00+00:00 -0.048583  0.019509  0.131986

I can calculate a column pairwise correlation simply with:

pt = pt.rolling(3).corr()

which yields:

sym                                 EOS       BTC       BNB
time                      fsym                              
2018-11-30 00:00:00+00:00 EOS        NaN       NaN       NaN
                          BTC        NaN       NaN       NaN
                          BNB        NaN       NaN       NaN
2018-12-01 00:00:00+00:00 EOS        NaN       NaN       NaN
                          BTC        NaN       NaN       NaN
                          BNB        NaN       NaN       NaN
2018-12-02 00:00:00+00:00 EOS   1.000000  0.952709  0.938688
                          BTC   0.952709  1.000000  0.999066
                          BNB   0.938688  0.999066  1.000000
2018-12-03 00:00:00+00:00 EOS   1.000000  0.998738  0.969385
                          BTC   0.998738  1.000000  0.980492
                          BNB   0.969385  0.980492  1.000000
...

How can I similarly calculate the pairwise differences for the dataframe? It would be the equivalent of using a rolling window of 1 I guess.

EDIT: As pointed out in the comments, the above example isn't actually a columnwise correlation which I hadn't noticed.


Solution

  • If you want the 9 columns:

    # test data
    df = pd.DataFrame(np.arange(12).reshape(-1,3), columns=list('abc'))
    
    s = df.values
    new_cols = pd.MultiIndex.from_product([df.columns, df.columns])
    
    pd.DataFrame((s[:,None,:] - s[:, :,  None]).reshape(len(df), -1),
                 index=df.index,
                 columns=new_cols)
    

    Output:

       a        b        c      
       a  b  c  a  b  c  a  b  c
    0  0  1  2 -1  0  1 -2 -1  0
    1  0  1  2 -1  0  1 -2 -1  0
    2  0  1  2 -1  0  1 -2 -1  0
    3  0  1  2 -1  0  1 -2 -1  0