Stop adding extra column when concating to a dataframe when grouping by a column

I would like to concatenate only to the rows in using the following code. I have tried changing the index and the axis of the concat method with no luck.


df = pd.DataFrame(({'C1':['a','b','c','a','b','c'], 'C2':[1,2,3,4,5,6], 'C3':[1,2,3,4,5,6]}))
dfNew = pd.DataFrame()
for name, group in df.groupby('C1', sort=False):
    dfNew = pd.concat([dfNew, group, pd.Series(np.nan)])

Actual output:


C1  C2  C3  0
0   a   1.0 1.0 NaN
1   a   4.0 4.0 NaN
0   NaN NaN NaN NaN
0   b   2.0 2.0 NaN
1   b   5.0 5.0 NaN
0   NaN NaN NaN NaN
0   c   3.0 3.0 NaN
1   c   6.0 6.0 NaN
0   NaN NaN NaN NaN

Desired output:

C1  C2  C3  
0   a   1.0 1.0 
1   a   4.0 4.0 
0   NaN NaN NaN 
0   b   2.0 2.0 
1   b   5.0 5.0 
0   NaN NaN NaN 
0   c   3.0 3.0 
1   c   6.0 6.0 
0   NaN NaN NaN 

Thank you.

Solution

Your solution is possible change by new DataFrame:

dfNew = pd.DataFrame()
for name, group in df.groupby('C1', sort=False):
    dfNew = pd.concat([dfNew, group, pd.DataFrame(np.nan, columns=df.columns, index=[0])])


print(dfNew)

    C1   C2   C3
0    a  1.0  1.0
3    a  4.0  4.0
0  NaN  NaN  NaN
1    b  2.0  2.0
4    b  5.0  5.0
0  NaN  NaN  NaN
2    c  3.0  3.0
5    c  6.0  6.0
0  NaN  NaN  NaN

Non loop solution is possible in sorted by column C1 DataFrame.sort_values by append one DataFrame and last sorting indices by DataFrame.sort_index:

df = df.sort_values('C1', ignore_index=True)

i = df.drop_duplicates('C1', keep='last').index
dfNew = pd.concat([df, pd.DataFrame(np.nan, columns=df.columns, index=i)]).sort_index(kind='stable')
print(dfNew)

    C1   C2   C3
0    a  1.0  1.0
1    a  4.0  4.0
1  NaN  NaN  NaN
2    b  2.0  2.0
3    b  5.0  5.0
3  NaN  NaN  NaN
4    c  3.0  3.0
5    c  6.0  6.0
5  NaN  NaN  NaN