Search code examples
pythonpandasdataframe

Keep first instance of duplicate column name, unless empty then keep second instance of column


I have a dataframe with multiple columns, two of these columns have the same name ('mobilephone'), some values are empty, and some aren't but there will always be one of the two columns populated with a phone number:

             mobilephone          mobilephone 
0              999000111            999000111       
1              999000222               
2                                   999000333    
3              999000444            999000444   

How would I keep only one of these columns but populate the empty values in the first column with the values in the second column?


Solution

  • First, find the index of the second duplicate column. Then change the name of this column and use .loc[] to fill in the gaps:

    col_name = 'mobilephone'
    index_second_column = np.where(df.columns.duplicated())[0][0]
    df.columns.values[index_second_column] += '_1'
    
    df.loc[df[col_name].isnull(), colname] = df[col_name + '_1']
    df.drop(columns = col_name + '_1', inplace=True)