Search code examples
pythonpandas

Merging Two Rows (one with a value, the other NaN) in Pandas


I am aware similar questions have been asked before (How to merge two rows in a dataframe pandas, etc), but I am still struggling to do the following (except with pandas dataframe with many rows):

    team_token  day1    day2   day3  day4
0   abc          1      NaN     NaN   NaN
1   abc          NaN     1      NaN   NaN
2   abc          NaN     NaN    NaN    NaN
3   abc          NaN     NaN    NaN     1

I want to combine the rows with the same team_token so that the end result looks like:

    team_token  day1    day2  day3  day4
0   abc           1      1    NaN     1

Thank you in advance.


Solution

  • Use combine_first on two Series given by the two rows of you dataframe:

    import pandas as pd
    df = pd.DataFrame({'team_token':['abc', 'abc'], 'day1': [1, None], 'day2' : [None, 1]})
    
    df.loc[0].combine_first(df.loc[1])
    

    gives:

     team_token  day1    day2
    0   abc           1       1 
    

    #EDIT: A better solution that works also with your updated answer is simply:

    df.max()
    

    assuming that your day columns contain either Nan or ones.

    In case you have different team_tokens:

       day1  day2  day3 team_token
    0   1.0   NaN   NaN        abc
    1   NaN   NaN   NaN        abc
    2   1.0   1.0   NaN        abc
    3   NaN   NaN   NaN        abc
    4   1.0   NaN   1.0        ddd
    5   1.0   NaN   1.0        ddd
    6   NaN   NaN   1.0        ddd
    

    you can group_by and still take the max of the group:

    df.groupby('team_token').max().reset_index()
    
    #  team_token  day1  day2  day3
    #0        abc   1.0   1.0   NaN
    #1        ddd   1.0   NaN   1.0