Search code examples
pandasconcatenation

Stop adding extra column when concating to a dataframe when grouping by a column


I would like to concatenate only to the rows in using the following code. I have tried changing the index and the axis of the concat method with no luck.


df = pd.DataFrame(({'C1':['a','b','c','a','b','c'], 'C2':[1,2,3,4,5,6], 'C3':[1,2,3,4,5,6]}))
dfNew = pd.DataFrame()
for name, group in df.groupby('C1', sort=False):
    dfNew = pd.concat([dfNew, group, pd.Series(np.nan)])
   

Actual output:


C1  C2  C3  0
0   a   1.0 1.0 NaN
1   a   4.0 4.0 NaN
0   NaN NaN NaN NaN
0   b   2.0 2.0 NaN
1   b   5.0 5.0 NaN
0   NaN NaN NaN NaN
0   c   3.0 3.0 NaN
1   c   6.0 6.0 NaN
0   NaN NaN NaN NaN

Desired output:

C1  C2  C3  
0   a   1.0 1.0 
1   a   4.0 4.0 
0   NaN NaN NaN 
0   b   2.0 2.0 
1   b   5.0 5.0 
0   NaN NaN NaN 
0   c   3.0 3.0 
1   c   6.0 6.0 
0   NaN NaN NaN 

Thank you.

Solution

  • Your solution is possible change by new DataFrame:

    dfNew = pd.DataFrame()
    for name, group in df.groupby('C1', sort=False):
        dfNew = pd.concat([dfNew, group, pd.DataFrame(np.nan, columns=df.columns, index=[0])])
    
    
    print(dfNew)
    
        C1   C2   C3
    0    a  1.0  1.0
    3    a  4.0  4.0
    0  NaN  NaN  NaN
    1    b  2.0  2.0
    4    b  5.0  5.0
    0  NaN  NaN  NaN
    2    c  3.0  3.0
    5    c  6.0  6.0
    0  NaN  NaN  NaN
    

    Non loop solution is possible in sorted by column C1 DataFrame.sort_values by append one DataFrame and last sorting indices by DataFrame.sort_index:

    df = df.sort_values('C1', ignore_index=True)
    
    i = df.drop_duplicates('C1', keep='last').index
    dfNew = pd.concat([df, pd.DataFrame(np.nan, columns=df.columns, index=i)]).sort_index(kind='stable')
    print(dfNew)
    
        C1   C2   C3
    0    a  1.0  1.0
    1    a  4.0  4.0
    1  NaN  NaN  NaN
    2    b  2.0  2.0
    3    b  5.0  5.0
    3  NaN  NaN  NaN
    4    c  3.0  3.0
    5    c  6.0  6.0
    5  NaN  NaN  NaN