I want to create equal sized numpy (padded) array from pandas, ultimately to be given as input to keras model
import pandas as pd
df = pd.DataFrame([[1, 1.2, 2.2],
[1, 3.2, 4.6],
[2, 5.5, 6.6]], columns = ['id', 'X1', 'X2']
)
df
>>
id X1 X2
0 1 1.2 2.2
1 1 3.2 4.6
2 2 5.5 6.6
Expected Output - 3d numpy array with padding
array[
[
[1.2, 2.2],
[3.2, 4.6]
],
[
[5.5, 6.6],
[0, 0]
]
]
Can anyone help me?
Use DataFrame.reindex
with counter by GroupBy.cumcount
for append zero rows first:
df['g'] = df.groupby('id').cumcount()
ids = df['id'].unique()
maxg = df['g'].max()+1
df1 = (df.set_index(['id','g'])
.reindex(pd.MultiIndex.from_product([ids, np.arange(maxg)]), fill_value=0))
print (df1)
X1 X2
1 0 1.2 2.2
1 3.2 4.6
2 0 5.5 6.6
1 0.0 0.0
And then convert values to numpy arrays and reshape to 3d:
a = df1.to_numpy().reshape(len(ids), maxg, len(df1.columns))
print (a)
[[[1.2 2.2]
[3.2 4.6]]
[[5.5 6.6]
[0. 0. ]]]
Alternative solution:
df['g'] = df.groupby('id').cumcount()
df1 = (df.set_index(['id','g']).unstack(fill_value=0)
.sort_index(axis=1, level=1, sort_remaining=False))
print (df1)
X1 X2 X1 X2
g 0 0 1 1
id
1 1.2 2.2 3.2 4.6
2 5.5 6.6 0.0 0.0
ids = df['id'].unique()
maxg = df['g'].max()+1
a = df1.to_numpy().reshape(len(ids),maxg, len(df1.columns) // maxg)
print (a)
[[[1.2 2.2]
[3.2 4.6]]
[[5.5 6.6]
[0. 0. ]]]