Search code examples
pythonpandasmulti-index

Creating new column based on NaNs in multi-index data frame (Python)


I have the following multi index data frame and I want to create a new column that tells me wether a company still exists or not. ID and Year are part of the multi index.

 id  Year  Profit/Loss Total Sales  
 0   2008  300.        2000.        
 0   2009  400.        2000.       
 0   2010  500.        2000.       
 0   2011  NaN         NaN       
 0   2012  NaN         NaN   
 1   2008  300.        2000.       
 1   2009  300.        2000.  

I would need a function that checks if both columns (Total Sales and Profit/Loss) are NaN in a specific year and if they are return a 0 in the solvency column. If one of them or both have values, then it should return a 1.

Desired output:

 id  Year  Profit/Loss Total Sales  Solvency
 0   2008  300.        2000.        1
 0   2009  400.        2000.        1
 0   2010  500.        2000.        1
 0   2011  NaN         NaN          0
 0   2012  NaN         NaN          0
 1   2008  300.        2000.        1
 1   2009  300.        2000.        1

Solution

  • You can use notna to identify the non-NaN columns, aggregate to boolean if any is True per row and convert to integer with astype:

    df['Solvency'] = df[['Profit/Loss', 'Total Sales']].notna().any(1).astype(int)
    

    output:

       id  Year  Profit/Loss  Total Sales  Solvency
    0   0  2008        300.0       2000.0         1
    1   0  2009        400.0       2000.0         1
    2   0  2010        500.0       2000.0         1
    3   0  2011          NaN          NaN         0
    4   0  2012          NaN          NaN         0
    5   1  2008        300.0       2000.0         1
    6   1  2009        300.0       2000.0         1