Search code examples
pythonpandasdataframeeigenvalue

Find the eigenvalues of a subset of Dataframe in Python


I have a matrix in the form of DataFrame

   df=     6M         1Y         2Y         4Y         5Y        10Y        30Y
      6M   n/a        n/a        n/a        n/a        n/a        n/a        n/a
      1Y   n/a          1  0.9465095   0.869504  0.8124711    0.64687  0.5089244
      2Y   n/a  0.9465095          1  0.9343177  0.8880676  0.7423546  0.6048189
      4Y   n/a   0.869504  0.9343177          1  0.9762842  0.8803984  0.7760753
      5Y   n/a  0.8124711  0.8880676  0.9762842          1  0.9117788  0.8404656
      10Y  n/a    0.64687  0.7423546  0.8803984  0.9117788          1  0.9514033
      30Y  n/a  0.5089244  0.6048189  0.7760753  0.8404656  0.9514033          1

I read the values from a matrix (real numbers) and whenever there is no data I insert 'n/a'(need to maintain this format for other reasons). I would like to compute the eigenvalues of subset of DataFrame that contains float values (essentially subset from '1Y' to '30Y').

I can extract the subset using iloc

tmp = df.iloc[1:df.shapep[0],1:df.shape[1]] 

and this extract the correct values (check the types and they are float). But when I try to compute the eigenvalues of tmp using np.linalg.eigvalsh I get an error

TypeError: No loop matching the specified signature and casting
was found for ufunc eigvalsh_lo

The strange thing is that when I start from a dataframe where 'n/a' are replaces by '0.0' the the whole process can be done with no problem (it needs to be initialized by 0.0 and not for instance 0). It seems that if some part of the dataframe is not real the subset extraction does not turn the values in real numbers.

Is there a way to overcome this problem?


Solution

  • IIUC you could convert your columns to numeric with pd.to_numericand replace non-numeric with NaN then using fillna() you could fill them with 0 and use np.linalg.eigvals:

    In [348]: df.apply(pd.to_numeric, errors='coerce')
    Out[348]:
         6M        1Y        2Y        4Y        5Y       10Y       30Y
    6M  NaN       NaN       NaN       NaN       NaN       NaN       NaN
    1Y  NaN  1.000000  0.946509  0.869504  0.812471  0.646870  0.508924
    2Y  NaN  0.946509  1.000000  0.934318  0.888068  0.742355  0.604819
    4Y  NaN  0.869504  0.934318  1.000000  0.976284  0.880398  0.776075
    5Y  NaN  0.812471  0.888068  0.976284  1.000000  0.911779  0.840466
    10Y NaN  0.646870  0.742355  0.880398  0.911779  1.000000  0.951403
    30Y NaN  0.508924  0.604819  0.776075  0.840466  0.951403  1.000000
    
    In [350]: df.apply(pd.to_numeric, errors='coerce').fillna(0)
    Out[350]:
         6M        1Y        2Y        4Y        5Y       10Y       30Y
    6M    0  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
    1Y    0  1.000000  0.946509  0.869504  0.812471  0.646870  0.508924
    2Y    0  0.946509  1.000000  0.934318  0.888068  0.742355  0.604819
    4Y    0  0.869504  0.934318  1.000000  0.976284  0.880398  0.776075
    5Y    0  0.812471  0.888068  0.976284  1.000000  0.911779  0.840466
    10Y   0  0.646870  0.742355  0.880398  0.911779  1.000000  0.951403
    30Y   0  0.508924  0.604819  0.776075  0.840466  0.951403  1.000000
    
    In [351]: np.linalg.eigvals(df.apply(pd.to_numeric, errors='coerce').fillna(0))
    Out[351]:
    array([ 5.11329285,  0.7269089 ,  0.07770957,  0.01334893,  0.02909796,
            0.03964179,  0.        ])
    

    After applying pd.to_numeric all values becoming float:

    In [352]: df.apply(pd.to_numeric, errors='coerce').dtypes
    Out[352]:
    6M     float64
    1Y     float64
    2Y     float64
    4Y     float64
    5Y     float64
    10Y    float64
    30Y    float64
    dtype: object
    

    Note pd.to_numeric works only with pandas version >= 0.17.0.

    If you have only 'n/a' values you could use replace and astype(float):

    df.replace('n/a', 0).astype(float)
    
    In [364]: df.replace('n/a', 0).astype(float)
    Out[364]:
         6M        1Y        2Y        4Y        5Y       10Y       30Y
    6M    0  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
    1Y    0  1.000000  0.946510  0.869504  0.812471  0.646870  0.508924
    2Y    0  0.946510  1.000000  0.934318  0.888068  0.742355  0.604819
    4Y    0  0.869504  0.934318  1.000000  0.976284  0.880398  0.776075
    5Y    0  0.812471  0.888068  0.976284  1.000000  0.911779  0.840466
    10Y   0  0.646870  0.742355  0.880398  0.911779  1.000000  0.951403
    30Y   0  0.508924  0.604819  0.776075  0.840466  0.951403  1.000000
    
    In [365]: np.linalg.eigvals(df.replace('n/a', 0).astype(float))
    Out[365]:
    array([ 5.11329285,  0.7269089 ,  0.07770957,  0.01334893,  0.02909796,
            0.03964179,  0.        ])