So, I am familiar with some more advanced Pandas indexing and with using .apply()
on columns/rows of data, but for some reason I cannot figure out how to get a variance-covariance matrix for some columns manually without reverting to .corr()
.
Whenever I try the following to slice the columns I want (all rows, and all columns after "APPL R"),
a = np.cov(data1.loc[:,'AAPL R':])
I get an answer like
array([[ 7.27823733e-04, 2.26078140e-04, 1.99410574e-04, ...,
9.57801851e-05, -2.76751908e-04, 5.71509542e-05],
[ 2.26078140e-04, 1.26173616e-04, 2.17306383e-04, ...,
3.30070375e-05, -2.47989495e-05, 1.16819595e-05],
[ 1.99410574e-04, 2.17306383e-04, 1.92984210e-03, ...,
-2.27400788e-04, -4.46589343e-05, -5.90865613e-05],
...,
[ 9.57801851e-05, 3.30070375e-05, -2.27400788e-04, ...,
How could I manually get a simple output like this in Pandas?
Thank you in advance.
If you insist on not using DataFrame.corr()
:
You aren't sending the correct lists to the numpy
function. You want each column as its own list, so you should be using the transpose of the DataFrame.
You want the correlation matrix, not the covariance matrix, so you should be using np.corrcoef()
. (If you really want the covariance matrix, then fine use np.cov()
)
Here's a df
I had in memory from another question.
open high low close volume
date
2017-11-01 44.66 44.75 42.19 42.93 3500
2017-11-03 44.66 44.75 42.19 42.93 3500
2017-11-06 43.15 43.75 40.60 41.02 9200
2017-11-07 43.15 43.75 40.60 41.02 9200
2017-11-08 43.15 43.75 40.60 41.02 9200
2017-11-09 43.15 43.75 40.60 41.02 9200
2017-11-10 43.15 43.75 40.60 41.02 9200
2017-11-13 41.60 43.21 40.03 42.36 3575
2017-11-14 41.60 43.21 40.03 42.36 3575
...
df.corr()
# open high low close volume
#open 1.000000 0.891708 0.957078 0.351604 0.320314
#high 0.891708 1.000000 0.878307 0.610183 0.311939
#low 0.957078 0.878307 1.000000 0.559366 0.146151
#close 0.351604 0.610183 0.559366 1.000000 -0.132609
#volume 0.320314 0.311939 0.146151 -0.132609 1.000000
np.corrcoef(df.loc[:,'open':].values.T)
#array([[ 1. , 0.89170836, 0.95707833, 0.35160354, 0.32031362],
# [ 0.89170836, 1. , 0.87830748, 0.61018322, 0.31193906],
# [ 0.95707833, 0.87830748, 1. , 0.55936625, 0.14615072],
# [ 0.35160354, 0.61018322, 0.55936625, 1. , -0.13260909],
# [ 0.32031362, 0.31193906, 0.14615072, -0.13260909, 1. ]])