Search code examples
pythonpandasduplicatesdrop

Pandas Merge and Complete rows with same id


Here is an extract of my dataframe :

ID LU MA ME JE VE SA DI
200 B B B
201 C C C C C
211 A
211 D D D
211 B
213 A A
216 K K K K
216 K
217 B B B B B

I have some rows with same ID and want to "merge" them into only one row while completing them. Here is an example of what I want to have as a result :

ID LU MA ME JE VE SA DI
200 B B B
201 C C C C C
211 A D D D B
213 A A
216 K K K K K
217 B B B B B

I'm new to pandas dataframes and have try to use drop_duplicates method but I need something different because of the restriction on keep parameters. Also the dataframe is sorted by ID.


Solution

  • If there is only one non empty value per groups use:

    df = df.replace('',np.nan).groupby('ID', as_index=False).first().fillna('')
    

    If possible multiple values and need unique values in original order use lambda function:

    print (df)
        ID LU MA ME JE VE SA DI
    0  201  B     C  B         
    1  201  C  C  C  B  C    
    
    
    f = lambda x: ','.join(dict.fromkeys(x.dropna()).keys())
    df = df.replace('',np.nan).groupby('ID', as_index=False).agg(f)
    print (df)
        ID   LU MA ME JE VE SA DI
    0  201  B,C  C  C  B  C