Search code examples
pythonpandasdataframemulti-indexcalculated-columns

How to add level 0 columns in multi-index data frame and keep format


I have a large multi-indexed data frame with many rows and columns. I want to add a column to the data frame at level one, while keeping the level two structure. My first goal is to understand how to add a new blank column to this data frame. My second goal is to understand how to populate that new column with a calculation, in this case a simple subtraction between the first column fields and second column fields.

Original Data frame:

data = [[99,3,12,4,63,55]]

cols = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23'],['Values','Sites']])

df = pd.DataFrame(data, columns = cols)

df.columns.set_levels(['1. FY21','2. FY22','3. FY23'],level=0,inplace=True)
df.columns.set_levels(['Values', 'Sites'], level=1, inplace=True)

enter image description here

Desired Output #1 - Adding just an empty new column

data_new = [[99,3,12,4,63,55,"blank","blank"]]

cols_new = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23','4. New Column'],['Values','Sites']])

df_new = pd.DataFrame(data_new, columns = cols_new)

enter image description here

Desired Output #2 - Doing a calculation in new column

data_newer = [[99,3,12,4,63,55,87,-1]]

cols_new = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23','4. New Column'],['Values','Sites']])

df_newer = pd.DataFrame(data_newer, columns = cols_new)

enter image description here


Solution

  • You can do:

    # get unique names for columns at level 0
    cols = df.columns.get_level_values(0).unique()
    
    # get first column
    first_column = df.xs(cols[0], level=0, axis=1)
    
    # get second column
    second_column = df.xs(cols[1], level=0, axis=1)
    
    # do the computation
    d = first_column - second_column
    
    # add column level
    d = pd.concat({"4. New Column": d}, axis=1)
    print(d)
    

    Prints:

      4. New Column       
              Sites Values
    0            87     -1
    

    Then add the column to the original dataframe:

    df = pd.concat([df, d], axis=1)
    print(df)
    

    Prints:

      1. FY21        2. FY22        3. FY23        4. New Column       
        Sites Values   Sites Values   Sites Values         Sites Values
    0      99      3      12      4      63     55            87     -1
    

    EDIT: To add blank column to original dataframe:

    # create a blank dataframe
    blank = pd.DataFrame(
        {("4. New Column", "Sites"): [np.nan], ("4. New Column", "Values"): [np.nan]}
    )
    
    # concatenate the blank dataframe with original df
    df = pd.concat([df, blank], axis=1)
    print(df)
    

    Prints:

      1. FY21        2. FY22        3. FY23        4. New Column       
        Sites Values   Sites Values   Sites Values         Sites Values
    0      99      3      12      4      63     55           NaN    NaN
    

    Or: create new columns:

    df.loc[:, ("4. New Column", "Sites")] = [np.nan] * len(df)
    df.loc[:, ("4. New Column", "Values")] = [np.nan] * len(df)