I have a large multi-indexed data frame with many rows and columns. I want to add a column to the data frame at level one, while keeping the level two structure. My first goal is to understand how to add a new blank column to this data frame. My second goal is to understand how to populate that new column with a calculation, in this case a simple subtraction between the first column fields and second column fields.
Original Data frame:
data = [[99,3,12,4,63,55]]
cols = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23'],['Values','Sites']])
df = pd.DataFrame(data, columns = cols)
df.columns.set_levels(['1. FY21','2. FY22','3. FY23'],level=0,inplace=True)
df.columns.set_levels(['Values', 'Sites'], level=1, inplace=True)
Desired Output #1 - Adding just an empty new column
data_new = [[99,3,12,4,63,55,"blank","blank"]]
cols_new = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23','4. New Column'],['Values','Sites']])
df_new = pd.DataFrame(data_new, columns = cols_new)
Desired Output #2 - Doing a calculation in new column
data_newer = [[99,3,12,4,63,55,87,-1]]
cols_new = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23','4. New Column'],['Values','Sites']])
df_newer = pd.DataFrame(data_newer, columns = cols_new)
You can do:
# get unique names for columns at level 0
cols = df.columns.get_level_values(0).unique()
# get first column
first_column = df.xs(cols[0], level=0, axis=1)
# get second column
second_column = df.xs(cols[1], level=0, axis=1)
# do the computation
d = first_column - second_column
# add column level
d = pd.concat({"4. New Column": d}, axis=1)
print(d)
Prints:
4. New Column
Sites Values
0 87 -1
Then add the column to the original dataframe:
df = pd.concat([df, d], axis=1)
print(df)
Prints:
1. FY21 2. FY22 3. FY23 4. New Column
Sites Values Sites Values Sites Values Sites Values
0 99 3 12 4 63 55 87 -1
EDIT: To add blank column to original dataframe:
# create a blank dataframe
blank = pd.DataFrame(
{("4. New Column", "Sites"): [np.nan], ("4. New Column", "Values"): [np.nan]}
)
# concatenate the blank dataframe with original df
df = pd.concat([df, blank], axis=1)
print(df)
Prints:
1. FY21 2. FY22 3. FY23 4. New Column
Sites Values Sites Values Sites Values Sites Values
0 99 3 12 4 63 55 NaN NaN
Or: create new columns:
df.loc[:, ("4. New Column", "Sites")] = [np.nan] * len(df)
df.loc[:, ("4. New Column", "Values")] = [np.nan] * len(df)