Search code examples
pandasdataframemulti-level

Combine two level headers in df based on condition


I have the df from the excel file that has double headers. The df looks like this but the first two rows are actually headers:

Name Unnamed Age
Unnamed Country Unnamed

Is there a way to combine them into one based on condition if it is not "Unnamed"? I did it it using:

df.columns = (df.columns.get_level_values(0)[0].tolist()
        + df.columns.get_level_values(1)[1].tolist()
        + df.columns.get_level_values(0)[2].tolist())

But I have many more columns than in the example and the excel file the df generated from sometimes might change its column order. So I don't want to be dependant on it.

So the expected result for columns is:

Name Country Age

Solution

  • If one level is Unnamed and another is with necessary data use:

    df.columns = [a if b.startswith('Unnamed') else b for a, b in df.columns]
    

    If possible both levels have necessary data use:

    df.columns = [a if b.startswith('Unnamed') 
                    else b 
                    if a.startswith('Unnamed') 
                    else f'{a}_{b}' for a, b in df.columns]