Search code examples
pythonpandasdataframemulti-index

How to add empty sub-columns to a multi-index dataframe?


I'm working with a multiindex - multicolumns DataFrame in Pandas. I have 16 main columns (level 0), such as Climate Change, Ozone Depletion,etc; and each of them contains 2 subcolumns (level 1): Agriculture and Transformation.

enter image description here

I want to create a new subcolumn (Transport) to each main column, that I will fill later with new data. The thing is that I don't want to make a loop (not sure how I would do that) but I can't find any direct way to perform it.

Already tried the loop option:

for i in list(df.columns.levels[0]):
    df[i, 'Transport'] = ''

However, this duplicates the main columns and create them with the new empty field. Is there anyway to do it without a loop? If there isn't, how do I group these new created columns?

Already check this question but didn't for me: Add a subcolumn to each column on multilevel column index

Thanks in advance.

EDIT

import pandas as pd
import numpy as np

df = pd.DataFrame([[1,2,3,4], [5,6,7,8],[9,10,11,12]], columns = pd.MultiIndex.from_product([('Acidification', 'Climate Change'), ('Agriculture', 'Transformation')]), index=['potato', 'tomato', 'onion'])

With @Laurent answer I got the following dataframe:

enter image description here

In order to get what I wanted, I needed to change one line:

dfs = []
for col in df.columns.levels[0]:
    tmp = df.copy()[col]
    tmp.loc[:, ("Transport")] = "" #This is the modified line
    dfs.append(tmp)
new_df = pd.concat(dfs, axis=1, keys=df.columns.levels[0])

This output:

enter image description here

However, I want a would like to have a more general method (maybe this one is, but I don't know how to do it).

Question 1: If now I would like to add new two subcolumns (level 2) to "Transport" subcolumn (level 1) to all main columns (level 0), how can I do it?

I have tried the following code, but raise an error of unhashable type: 'slice'

# Try to apply logic above
dfn = []
for col in new_df.columns.levels[0]:
    for scol in new_df.columns.levels[1]:
        tmp = new_df.copy()[col][scol]
        tmp.loc[:, ('National')] = ""
        tmp.loc[:, ('Import')] = ""
        dfn.append(tmp)
new_dfn = pd.concat(dfn, axis=1, keys=new_df.columns.levels[0])

Question 2: Is there a general way to add new subsubcolumns to a specific subcolumn (in all columns)?


Solution

  • With the following toy dataframe as an example:

    import pandas as pd
    
    df = pd.DataFrame(
        [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]],
        columns=pd.MultiIndex.from_product([("col1", "col2"), ("sub_col1", "sub_col2")]),
        index=[0, 1, 2],
    )
    
    print(df)
    # Output
          col1              col2
      sub_col1 sub_col2 sub_col1 sub_col2
    0        1        2        3        4
    1        5        6        7        8
    2        9       10       11       12
    

    Here is one way to do it using Pandas advanced indexing with hierarchical index and concat:

    dfs = []
    for col in df.columns.levels[0]:
        tmp = df.copy()[col]
        tmp.loc[:, ("sub_col3")] = ""
        dfs.append(tmp)
    
    new_df = pd.concat(dfs, axis=1, keys=df.columns.levels[0])
    

    Then:

    print(new_df)
    # Output
          col1                            col2
      sub_col1 sub_col2 col1 sub_col3 sub_col1 sub_col2 col2 sub_col3
    0        1        2                      3        4
    1        5        6                      7        8
    2        9       10                     11       12
    

    And here is how to add two new subcolumns (level 2) to a [target] subcolumn (level 1) to all main columns (level 0):

    dfs = []
    sub_col = "sub_col3"
    sub_sub_cols = ["sub_sub_col1", "sub_sub_col2"]
    for col in df.columns.levels[0]:
        tmp = df.copy()[col]
        tmp = pd.concat([tmp, pd.DataFrame([[""]*tmp.shape[1] for _ in range(tmp.shape[0])], columns=pd.MultiIndex.from_product([[sub_col], sub_sub_cols]))], axis=1)
        dfs.append(tmp)
    
    new_df = pd.concat(dfs, axis=1, keys=df.columns.levels[0])
    

    Then:

    print(new_df)
    # Output
        col1                                                             \
      sub_col1 sub_col2 (sub_col3, sub_sub_col1) (sub_col3, sub_sub_col2)   
    0        1        2                                                     
    1        5        6                                                     
    2        9       10                                                     
    
          col2                                                             
      sub_col1 sub_col2 (sub_col3, sub_sub_col1) (sub_col3, sub_sub_col2)  
    0        3        4                                                    
    1        7        8                                                    
    2       11       12                                                    
    

    Since all columns of a given level do not have the same number of subcolumns, columns header are represented by tuples.