Pandas merge rows if blank

I am looking to combine rows based on an id as long as the merge is not overwriting a value.

so for a df :

Column1 Column2 Column3 Column4
aa_1    123     456 
aa_2        123     
aa_4            123 
aa_6                
aa_1        789     789
aa_2    4567            
aa_4    234 456 678 987
aa_6    123 456 567 765

I am looking for the end result to be:

,Column1,Column2,Column3,Column4
aa_1,123,789,456,789
aa_2,4567,123,,
aa_4,234,456,678,987
aa_4,,,123,
aa_6,123,456,567,765

Code for recreating dataframe:

data = [['123', '', '456', ''], ['', '123', '', ''], ['', '', '123', ''], ['', '', '', ''], ['', '789', '', '789'], ['4567', '', '', ''], ['234', '456', '678', '987'], ['123', '456', '567', '765']]
index = ['aa_1', 'aa_2', 'aa_4', 'aa_6', 'aa_1', 'aa_2', 'aa_4', 'aa_6']
columns = ['Column1', 'Column2', 'Column3', 'Column4']

df = pd.DataFrame(data, index, columns)

Solution

It may not be the most elegant of all possible solutions, but this does the job:

The idea it to first set aside rows that already are filled. In your case it is the last two rows of your example. The reason for this is that they will always interfere with any other row beeing indexed the same way (aa_4 and aa_6). The next step is to merged rows containing empty cells. However, it is easier to deal with if

The index is unique (thereof the reindexing in the solution)
I empty cells are nan (thereof the replace statement)

So:

import pandas as pd
import numpy as np

data = [['123', '', '456', ''], ['', '123', '', ''], ['', '', '123', ''], ['', '', '', ''], ['', '789', '', '789'], ['4567', '', '', ''], ['234', '456', '678', '987'], ['123', '456', '567', '765']]
index = ['aa_1', 'aa_2', 'aa_4', 'aa_6', 'aa_1', 'aa_2', 'aa_4', 'aa_6']
columns = ['Column1', 'Column2', 'Column3', 'Column4']

df = pd.DataFrame(data, index, columns)
df = df.reset_index()

df2 = df.replace(r'^\s*$', np.nan, regex=True)
df3 = df2.dropna()
df4 = df[~df.index.isin(df3.index)]
df5 = df4.groupby(['index']).sum().reset_index()
df_full = pd.concat([df5,df3]).set_index(['index'])  #resets your original index

which returns

      Column1 Column2 Column3 Column4
index                                
aa_1      123     789     456     789
aa_2     4567     123                
aa_4                      123        
aa_6                                 
aa_4      234     456     678     987
aa_6      123     456     567     765

You only need to remove the empty row, as I did in the solution, e.g.,

df_full = df_full.replace(r'^\s*$', np.nan, regex=True)
df_full = df_full.dropna(how='all')