Search code examples
pythonpandasnumpykeras

Add zero vector row per group in pandas


I want to create equal sized numpy (padded) array from pandas, ultimately to be given as input to keras model

import pandas as pd
df = pd.DataFrame([[1, 1.2, 2.2], 
                   [1, 3.2, 4.6],
                   [2, 5.5, 6.6]], columns = ['id', 'X1', 'X2']
                 )
df
>> 
   id   X1   X2
0   1   1.2  2.2
1   1   3.2  4.6
2   2   5.5  6.6

Expected Output - 3d numpy array with padding

array[
        [
          [1.2, 2.2],
          [3.2, 4.6]
        ],
        [
          [5.5, 6.6],
          [0,   0]
        ]
     ]

Can anyone help me?


Solution

  • Use DataFrame.reindex with counter by GroupBy.cumcount for append zero rows first:

    df['g'] = df.groupby('id').cumcount()
    
    ids = df['id'].unique()
    maxg = df['g'].max()+1
    df1 = (df.set_index(['id','g'])
              .reindex(pd.MultiIndex.from_product([ids, np.arange(maxg)]), fill_value=0))
    print (df1)
          X1   X2
    1 0  1.2  2.2
      1  3.2  4.6
    2 0  5.5  6.6
      1  0.0  0.0
    

    And then convert values to numpy arrays and reshape to 3d:

    a = df1.to_numpy().reshape(len(ids), maxg, len(df1.columns))
    print (a)
    [[[1.2 2.2]
      [3.2 4.6]]
    
     [[5.5 6.6]
      [0.  0. ]]]
    

    Alternative solution:

    df['g'] = df.groupby('id').cumcount()
    
    df1 = (df.set_index(['id','g']).unstack(fill_value=0)
             .sort_index(axis=1, level=1, sort_remaining=False))
    print (df1)
         X1   X2   X1   X2
    g     0    0    1    1
    id                    
    1   1.2  2.2  3.2  4.6
    2   5.5  6.6  0.0  0.0
    
    ids = df['id'].unique()
    maxg = df['g'].max()+1
    
    a = df1.to_numpy().reshape(len(ids),maxg, len(df1.columns) // maxg)
    print (a)
    [[[1.2 2.2]
      [3.2 4.6]]
    
     [[5.5 6.6]
      [0.  0. ]]]