I would like to concatenate only to the rows in using the following code. I have tried changing the index and the axis of the concat method with no luck.
df = pd.DataFrame(({'C1':['a','b','c','a','b','c'], 'C2':[1,2,3,4,5,6], 'C3':[1,2,3,4,5,6]}))
dfNew = pd.DataFrame()
for name, group in df.groupby('C1', sort=False):
dfNew = pd.concat([dfNew, group, pd.Series(np.nan)])
Actual output:
C1 C2 C3 0
0 a 1.0 1.0 NaN
1 a 4.0 4.0 NaN
0 NaN NaN NaN NaN
0 b 2.0 2.0 NaN
1 b 5.0 5.0 NaN
0 NaN NaN NaN NaN
0 c 3.0 3.0 NaN
1 c 6.0 6.0 NaN
0 NaN NaN NaN NaN
Desired output:
C1 C2 C3
0 a 1.0 1.0
1 a 4.0 4.0
0 NaN NaN NaN
0 b 2.0 2.0
1 b 5.0 5.0
0 NaN NaN NaN
0 c 3.0 3.0
1 c 6.0 6.0
0 NaN NaN NaN
Thank you.
Your solution is possible change by new DataFrame
:
dfNew = pd.DataFrame()
for name, group in df.groupby('C1', sort=False):
dfNew = pd.concat([dfNew, group, pd.DataFrame(np.nan, columns=df.columns, index=[0])])
print(dfNew)
C1 C2 C3
0 a 1.0 1.0
3 a 4.0 4.0
0 NaN NaN NaN
1 b 2.0 2.0
4 b 5.0 5.0
0 NaN NaN NaN
2 c 3.0 3.0
5 c 6.0 6.0
0 NaN NaN NaN
Non loop solution is possible in sorted by column C1
DataFrame.sort_values
by append one DataFrame and last sorting indices by DataFrame.sort_index
:
df = df.sort_values('C1', ignore_index=True)
i = df.drop_duplicates('C1', keep='last').index
dfNew = pd.concat([df, pd.DataFrame(np.nan, columns=df.columns, index=i)]).sort_index(kind='stable')
print(dfNew)
C1 C2 C3
0 a 1.0 1.0
1 a 4.0 4.0
1 NaN NaN NaN
2 b 2.0 2.0
3 b 5.0 5.0
3 NaN NaN NaN
4 c 3.0 3.0
5 c 6.0 6.0
5 NaN NaN NaN