I have a matrix in the form of DataFrame
df= 6M 1Y 2Y 4Y 5Y 10Y 30Y
6M n/a n/a n/a n/a n/a n/a n/a
1Y n/a 1 0.9465095 0.869504 0.8124711 0.64687 0.5089244
2Y n/a 0.9465095 1 0.9343177 0.8880676 0.7423546 0.6048189
4Y n/a 0.869504 0.9343177 1 0.9762842 0.8803984 0.7760753
5Y n/a 0.8124711 0.8880676 0.9762842 1 0.9117788 0.8404656
10Y n/a 0.64687 0.7423546 0.8803984 0.9117788 1 0.9514033
30Y n/a 0.5089244 0.6048189 0.7760753 0.8404656 0.9514033 1
I read the values from a matrix (real numbers) and whenever there is no data I insert 'n/a'
(need to maintain this format for other reasons).
I would like to compute the eigenvalues of subset of DataFrame that contains float values (essentially subset from '1Y'
to '30Y'
).
I can extract the subset using iloc
tmp = df.iloc[1:df.shapep[0],1:df.shape[1]]
and this extract the correct values (check the types and they are float). But when I try to compute the eigenvalues of tmp
using np.linalg.eigvalsh
I get an error
TypeError: No loop matching the specified signature and casting
was found for ufunc eigvalsh_lo
The strange thing is that when I start from a dataframe where 'n/a'
are replaces by '0.0'
the the whole process can be done with no problem (it needs to be initialized by 0.0
and not for instance 0
).
It seems that if some part of the dataframe is not real the subset extraction does not turn the values in real numbers.
Is there a way to overcome this problem?
IIUC you could convert your columns to numeric with pd.to_numeric
and replace non-numeric with NaN
then using fillna()
you could fill them with 0
and use np.linalg.eigvals
:
In [348]: df.apply(pd.to_numeric, errors='coerce')
Out[348]:
6M 1Y 2Y 4Y 5Y 10Y 30Y
6M NaN NaN NaN NaN NaN NaN NaN
1Y NaN 1.000000 0.946509 0.869504 0.812471 0.646870 0.508924
2Y NaN 0.946509 1.000000 0.934318 0.888068 0.742355 0.604819
4Y NaN 0.869504 0.934318 1.000000 0.976284 0.880398 0.776075
5Y NaN 0.812471 0.888068 0.976284 1.000000 0.911779 0.840466
10Y NaN 0.646870 0.742355 0.880398 0.911779 1.000000 0.951403
30Y NaN 0.508924 0.604819 0.776075 0.840466 0.951403 1.000000
In [350]: df.apply(pd.to_numeric, errors='coerce').fillna(0)
Out[350]:
6M 1Y 2Y 4Y 5Y 10Y 30Y
6M 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
1Y 0 1.000000 0.946509 0.869504 0.812471 0.646870 0.508924
2Y 0 0.946509 1.000000 0.934318 0.888068 0.742355 0.604819
4Y 0 0.869504 0.934318 1.000000 0.976284 0.880398 0.776075
5Y 0 0.812471 0.888068 0.976284 1.000000 0.911779 0.840466
10Y 0 0.646870 0.742355 0.880398 0.911779 1.000000 0.951403
30Y 0 0.508924 0.604819 0.776075 0.840466 0.951403 1.000000
In [351]: np.linalg.eigvals(df.apply(pd.to_numeric, errors='coerce').fillna(0))
Out[351]:
array([ 5.11329285, 0.7269089 , 0.07770957, 0.01334893, 0.02909796,
0.03964179, 0. ])
After applying pd.to_numeric
all values becoming float:
In [352]: df.apply(pd.to_numeric, errors='coerce').dtypes
Out[352]:
6M float64
1Y float64
2Y float64
4Y float64
5Y float64
10Y float64
30Y float64
dtype: object
Note pd.to_numeric
works only with pandas
version >= 0.17.0
.
If you have only 'n/a'
values you could use replace
and astype(float)
:
df.replace('n/a', 0).astype(float)
In [364]: df.replace('n/a', 0).astype(float)
Out[364]:
6M 1Y 2Y 4Y 5Y 10Y 30Y
6M 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
1Y 0 1.000000 0.946510 0.869504 0.812471 0.646870 0.508924
2Y 0 0.946510 1.000000 0.934318 0.888068 0.742355 0.604819
4Y 0 0.869504 0.934318 1.000000 0.976284 0.880398 0.776075
5Y 0 0.812471 0.888068 0.976284 1.000000 0.911779 0.840466
10Y 0 0.646870 0.742355 0.880398 0.911779 1.000000 0.951403
30Y 0 0.508924 0.604819 0.776075 0.840466 0.951403 1.000000
In [365]: np.linalg.eigvals(df.replace('n/a', 0).astype(float))
Out[365]:
array([ 5.11329285, 0.7269089 , 0.07770957, 0.01334893, 0.02909796,
0.03964179, 0. ])