python pandas dataframe multi-index calculated-columns

How to add level 0 columns in multi-index data frame and keep format

I have a large multi-indexed data frame with many rows and columns. I want to add a column to the data frame at level one, while keeping the level two structure. My first goal is to understand how to add a new blank column to this data frame. My second goal is to understand how to populate that new column with a calculation, in this case a simple subtraction between the first column fields and second column fields.

Original Data frame:

data = [[99,3,12,4,63,55]]

cols = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23'],['Values','Sites']])

df = pd.DataFrame(data, columns = cols)

df.columns.set_levels(['1. FY21','2. FY22','3. FY23'],level=0,inplace=True)
df.columns.set_levels(['Values', 'Sites'], level=1, inplace=True)

Desired Output #1 - Adding just an empty new column

data_new = [[99,3,12,4,63,55,"blank","blank"]]

cols_new = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23','4. New Column'],['Values','Sites']])

df_new = pd.DataFrame(data_new, columns = cols_new)

Desired Output #2 - Doing a calculation in new column

data_newer = [[99,3,12,4,63,55,87,-1]]

cols_new = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23','4. New Column'],['Values','Sites']])

df_newer = pd.DataFrame(data_newer, columns = cols_new)

Solution

You can do:

# get unique names for columns at level 0
cols = df.columns.get_level_values(0).unique()

# get first column
first_column = df.xs(cols[0], level=0, axis=1)

# get second column
second_column = df.xs(cols[1], level=0, axis=1)

# do the computation
d = first_column - second_column

# add column level
d = pd.concat({"4. New Column": d}, axis=1)
print(d)

Prints:

  4. New Column       
          Sites Values
0            87     -1

Then add the column to the original dataframe:

df = pd.concat([df, d], axis=1)
print(df)

Prints:

  1. FY21        2. FY22        3. FY23        4. New Column       
    Sites Values   Sites Values   Sites Values         Sites Values
0      99      3      12      4      63     55            87     -1

EDIT: To add blank column to original dataframe:

# create a blank dataframe
blank = pd.DataFrame(
    {("4. New Column", "Sites"): [np.nan], ("4. New Column", "Values"): [np.nan]}
)

# concatenate the blank dataframe with original df
df = pd.concat([df, blank], axis=1)
print(df)

Prints:

  1. FY21        2. FY22        3. FY23        4. New Column       
    Sites Values   Sites Values   Sites Values         Sites Values
0      99      3      12      4      63     55           NaN    NaN

Or: create new columns:

df.loc[:, ("4. New Column", "Sites")] = [np.nan] * len(df)
df.loc[:, ("4. New Column", "Values")] = [np.nan] * len(df)