Search code examples
python-3.xpandasdataframepearson-correlation

Compute Correlation Dataframe for each Vector Row by Index Python


I have a dataframe with 500 columns indexed by date, with four years of data.

| Date | A | AAL | AAP | AAPL | ABC ......

| 1/2/2004 | 18.442521 |25.954398 |1.38449 |11.528444......

| 1/5/2004 | 18.922795 |25.718507 |1.442394 |11.919131...

| 1/6/2004 | 19.518334 |26.177538 |1.437189 |11.870028....

. . . etc...

I would like to calculate the Pearson correlation matrix for each day, so each row. I want to save the matrices by date, in the most space efficient manner readable by R. (Right now my goal is separate sheets, by index date, in Excel. I am open to suggestions.)

I have tried several ways, but this seemed the most promising, because I could not apply the corr() to a df.groupby.

However this method returned empty dataframes, and now I am stuck! I am looking for a method that doesn't involve iteration.

def do_Corr(df_group):
"""Apply the function to each group in the data and return one result."""
X = df_group.corr()
return X

df.groupby([df.index.year,df.index.month,df.index.day]).apply(do_Corr).dropna()

Solution

  • You probably want df.T.corr(). .T transposes the dataframe, so rows becomes columns, then you can apply .corr() method.