Search code examples
pythonpython-3.xpandascountmulti-index

how to counting True and False of Multiindex in pandas?


i'm using multiindex in pandas.

Example1 :

'info'       '2002'     '2003'     '2004'
'PID' 'Age'  't1' 't2'  't1' 't2'  't1' 't2'
 101   30     Nan  2     2    1     Nan  Nan
 102   28     2    5     1    Nan   10   100   

I want to know the number of things that are worth either.

[num] and [num] is True

[Nan] and [num] is True

[Nan] and [Nan] is False

Example2 :

'info'       
'PID' 'Age' 'count' 
 101   30      2
 102   28      3

so i think using a any(), but i did not solve it.


Solution

  • I think need if MultiIndex also in index first check notna of all values, then check at least one per first level of MultiIndex in columns by DataFrameGroupBy.any and last count Trues by sum:

    print (df.index)
    MultiIndex(levels=[[101, 102], [28, 30]],
               labels=[[0, 1], [1, 0]])
    
    df = df.notna().groupby(axis=1, level=0).any().sum(axis=1)
    #oldier pandas versions
    #df = df.notnull().groupby(axis=1, level=0).any().sum(axis=1)
    

    If there is no MultiIndex in index add drop for remove info level:

    print (df.index)
    RangeIndex(start=0, stop=2, step=1)
    
    s = df.drop('info', level=0, axis=1).notnull().groupby(axis=1, level=0).any().sum(axis=1)
    print (s)
    0    2
    1    3
    dtype: int64
    

    And if need create column by position use insert:

    df.insert(2, ('info','count'), s)
    print (df)
       info             '2002'      '2003'      '2004'       
      'PID' 'Age' count   't1' 't2'   't1' 't2'   't1'   't2'
    0   101    30     2    NaN    2      2  1.0    NaN    NaN
    1   102    28     3    2.0    5      1  NaN   10.0  100.0