Search code examples
python-3.xpandasdataframemulti-index

multiindex column and rows match if the column and row names are similar and exclude the values from adding in those cell python pandas


I need the cells excluded from summation when the top level of the multiindex (country1 and Country2) are the same in the row and columns

import pandas as pd
label_columns1 = ['Country1','Country2']
label_columns2 = ['agri1', 'agri2']
label_rows1 = ['Country1','Country2']
label_rows2 = ['agri1', 'agri2']

df = pd.DataFrame([[1,2,3,4]], index=pd.MultiIndex.from_product([label_rows1,label_rows2]), 
                     columns=pd.MultiIndex.from_product([label_columns1, label_columns2 ]))
df
                     Country1     Country2
                   agri1  agri2   agri1  agri2  
Country1    agri1   1      2      3       4
            agri2   1      2      3       4
Country2    agri1   1      2      3       4
            agri2   1      2      3       4

I know this will give me the row-wise sum and a new column

df[('Total',"Vlaue")] = df.sum(axis=1)

df

                     Country1     Country2  Total
                   agri1  agri2   agri1  agri2 Value    
Country1    agri1   1      2      3       4     10
            agri2   1      2      3       4     10
Country2    agri1   1      2      3       4     10
            agri2   1      2      3       4     10

What I really want and have been trying hard was only to add the cells if the column and row index is not the same. Which in this case for example the first row will be 7. my data frame is big and too much rows and columns. What i have tried so far

for col in df.columns:
    for row in df.index:
        if col != row:
            df[('Total',"Vlaue")] = df.sum(axis=1)

which gives me unintended results but I guess i am close My expected result should look like

                     Country1     Country2      Total
                   agri1  agri2   agri1  agri2  Value   
Country1    agri1   1      2      3       4     (3+4) = 7
            agri2   1      2      3       4     (3+4) = 7
Country2    agri1   1      2      3       4     (1+2) = 3
            agri2   1      2      3       4     (1+2) = 3

Solution

  • Mask the values where index is same as the column label, then calculate sum along axis=1

    ix = df.index.get_level_values(0)
    cx = df.columns.get_level_values(0)
    
    m = ix.values[:, None] == cx.values
    df[('Total', 'Value')] = df.mask(m).sum(axis=1)
    

                   Country1       Country2       Total
                      agri1 agri2    agri1 agri2 Value
    Country1 agri1        1     2        3     4   7.0
             agri2        1     2        3     4   7.0
    Country2 agri1        1     2        3     4   3.0
             agri2        1     2        3     4   3.0