Search code examples
pythonpandasgroup-byxlwings

Adding more than one empty row between pandas groups


I want to add several empty rows between each groupby in my pandas dataframe. I know similar questions have been asked in the past but all of the answers I could find rely on the recently discontinued append function. I think I am close but I cannot get it to work.

From what I've read, the idea is for the concat function to replace append so I have been trying to 1) Make my groups, 2) Make a blank dataframe with the correct columns and number of rows, then 3) Loop through the groups and concatenate them individually with the blank dataframe. This looked something like:

Current df:

    column1    column2    column3
0      a          1        blue
1      b          2        blue
2      a          1        green
3      b          2        green
4      a          1        black
5      b          2        black

What I expect:

    column1    column2    column3
0      a          1        blue
1      b          2        blue
0
1
2
3
4
2      a          1        green
3      b          2        green
0
1
2
3
4
4      a          1        black
5      b          2        black
What I tried:
# Create my groups by the desired column
dfg = df.groupby("column3")

# Create my blank df with the same columns as my main df and with the desired number of blank rows
blank_df5 = pd.DataFrame(columns=['column1','column2','column3'],index=['0','1','2','3','4'])

# Loop through and concatenate groups and the blank df
for colors in dfg:
    pd.concat([colors, blank_df5], ignore_index=True)

print(dfg)

This returned: TypeError: cannot concatenate object of type '<class 'tuple'>'; only Series and DataFrame objs are valid

I then tried making the groups into their own dfs and then looping through that like:

dfg = df.groupby('column1')
[dfg.get_group(x) for x in dfg.groups]

blank_df5 = pd.DataFrame(columns=['column1','column2','column3'],index=['0','1','2','3','4'])

for colors in dfg:
    pd.concat([colors, blank_df5], ignore_index=True)

# I also tried [pd.concat([colors, blank_df5], ignore_index=True) for column3 in dfw] with the same result

result was still: TypeError: cannot concatenate object of type '<class 'tuple'>'; only Series and DataFrame objs are valid

Other things I've tried:**

mask = df['column3'].ne(df['column3'].shift(-1))
df1 = pd.DataFrame('', index=mask.index[mask] + .5, columns=df.columns)

dfg = pd.concat([df,df1]).sort_index().reset_index(drop=True).iloc[:-1]

print(dfg)

This works to add one empty row in-between the groups, but I can't get it to add more than that.

dfg = (pd.concat([df, 
            df.groupby('column3').apply(lambda x: x.shift(-1).iloc[-1]).reset_index()])
           .sort_values('column3')
           .reset_index(drop=True))

print(dfg)

This returns: ValueError: cannot insert column3, already exists

dfg = df.groupby('column1')

for colors in dfg:
        new_rows = 5
        new_index = pd.RangeIndex(len(colors)*(new_rows+1))
        dfg = pd.DataFrame(np.nan, index=new_index, columns=df.columns)
        ids = np.arange(len(colors))*(new_rows+1)
        dfg.loc[ids] = df.values

print(dfg)

This returns: ValueError: could not broadcast input array from shape (710,) into shape (2,) If I remove the loop and just run what is in the loop it adds the empty rows in-between each row of data.

Hopefully this makes sense, thank you in advance for any help.

If anyone is curious, the reason I need to do this is to dump it into excel, it's a company decision, not mine.


Solution

  • Following your 2nd approach :

    N = 5
    
    grps = df.groupby("column3", sort=False)
    
    out = pd.concat(
        [
            pd.concat([g, pd.DataFrame("", index=range(N), columns=df.columns)])
            if i < len(grps)-1 else g for i, (_, g) in enumerate(grps)
        ]
    )
    

    Output :

    print(out)
    
      column1 column2 column3
    0       a       1    blue
    1       b       2    blue
    0                        
    1                        
    2                        
    3                        
    4                        
    2       a       1   green
    3       b       2   green
    0                        
    1                        
    2                        
    3                        
    4                        
    4       a       1   black
    5       b       2   black
    
    [16 rows x 3 columns]