Search code examples
pythonpandasdataframemulti-index

Add a empty column in pandas df with multiindex adjacent to an existing column creates duplicates


I am trying to harmonize the structure of two dataframes. They have the same columns except one of the dataframes which has had a df.compare() done to it so it is a multi index with "self" and "other" as sub columns.

In the end i wish to append this harmonized dataframe to the dataframe which had the compare operation done to it.

Existing df
                    Pants       Jacket
        Denim       yes         yes
        Cotton      no          no
        Silk        no          no


Goal structure
                    Pants           Jacket
                    self    other   self    other
        Denim       yes     new     yes     new
        Cotton      no      new     no      new
        Silk        no      new     no      new

I add the first sub-level like this, which adds the "self" element to the columns.

df_initial_unique.columns = [df_initial_unique.columns.get_level_values(0), np.repeat("self",df_initial_unique.shape[1])]

Then i want to add the "other" column, which should have the same value for all elements, "new", by iterating through the columns.

for col in df_initial_unique.columns:
    df_initial_unique[(col,"other")] = "new"

For some reason this do not add the "other" column as a sub column, but copies all columns and adding the "other" as a sub column. Columns double in value.

                    Pants       Jacket      Pants       Jacket
                    self        self        other       other
        Denim       yes         yes         new         new
        Cotton      no          no          new         new
        Silk        no          no          new         new

I have tried many solutions presented here but none seem to work.


Solution

  • One possible solution:

    df.columns = pd.MultiIndex.from_product([df.columns, ["self"]])
    
    df = df.reindex(
        pd.MultiIndex.from_product([df.columns.get_level_values(0), ["self", "other"]]),
        axis=1,
        fill_value="new",
    )
    
    print(df)
    

    Prints:

           Pants       Jacket      
            self other   self other
    Denim    yes   new    yes   new
    Cotton    no   new     no   new
    Silk      no   new     no   new