Search code examples
pythonpandasnumpymulti-index

Pandas Multi-Index DataFrame to Numpy Ndarray


I am trying to convert a multi-index pandas DataFrame into a numpy.ndarray. The DataFrame is below:

               s1  s2   s3   s4
Action State                   
1      s1     0.0   0  0.8  0.2
       s2     0.1   0  0.9  0.0
2      s1     0.0   0  0.9  0.1
       s2     0.0   0  1.0  0.0

I would like the resulting numpy.ndarray to be the following with np.shape() = (2,2,4):

[[[ 0.0  0.0  0.8  0.2 ]
  [ 0.1  0.0  0.9  0.0 ]]

 [[ 0.0  0.0  0.9  0.1 ]
  [ 0.0  0.0  1.0  0.0]]]

I have tried df.as_matrix() but this returns:

 [[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]
  [ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]

How do I return a list of lists for the first level with each list representing an Action records.


Solution

  • You could use the following:

    dim = len(df.index.get_level_values(0).unique())
    result = df.values.reshape((dim1, dim1, df.shape[1]))
    print(result)
    [[[ 0.   0.   0.8  0.2]
      [ 0.1  0.   0.9  0. ]]
    
     [[ 0.   0.   0.9  0.1]
      [ 0.   0.   1.   0. ]]]
    

    The first line just finds the number of groups that you want to groupby.

    Why this (or groupby) is needed: as soon as you use .values, you lose the dimensionality of the MultiIndex from pandas. So you need to re-pass that dimensionality to NumPy in some way.