Search code examples
python-3.xpandassplitnan

Get all the rows with and without NaN in pandas dataframe


Most efficient way of splitting the row which contains with and without NaN in pandas dataframe.

input :- ID    Gender    Dependants   Income   Education  Married
         1     Male      2            500      Graduate   Yes
         2     NaN       4            2500     Graduate   No
         3     Female    3            NaN      NaN        Yes
         4     Male      NaN          7000     Graduate   Yes
         5     Female    4            500      Graduate   NaN
         6     Female    2            4500     Graduate   Yes

The expected output without NaN is,

ID    Gender    Dependants    Income    Education    Married
1     Male      2             500       Graduate     Yes
6     Female    2             4500      Graduate     Yes

The expected output with NaN is,

ID    Gender    Dependants    Income    Education    Married
2     NaN       4             2500      Graduate     No
3     Female    3             NaN       NaN          Yes
4     Male      NaN           7000      Graduate     Yes
5     Female    4             500       Graduate     NaN 

Solution

  • Use boolean indexing with check missing values and any for check at least one True per rows:

    mask = df.isnull().any(axis=1)
    
    df1 = df[~mask]
    df2 = df[mask]
    print (df1)
       ID  Gender  Dependants  Income Education Married
    0   1    Male         2.0   500.0  Graduate     Yes
    5   6  Female         2.0  4500.0  Graduate     Yes
    
    print (df2)
       ID  Gender  Dependants  Income Education Married
    1   2     NaN         4.0  2500.0  Graduate      No
    2   3  Female         3.0     NaN       NaN     Yes
    3   4    Male         NaN  7000.0  Graduate     Yes
    4   5  Female         4.0   500.0  Graduate     NaN
    

    Details:

    print (df.isnull())
         ID  Gender  Dependants  Income  Education  Married
    0  False   False       False   False      False    False
    1  False    True       False   False      False    False
    2  False   False       False    True       True    False
    3  False   False        True   False      False    False
    4  False   False       False   False      False     True
    5  False   False       False   False      False    False
    
    print (mask)
    0    False
    1     True
    2     True
    3     True
    4     True
    5    False
    dtype: bool
    

    And you can always use a more readable way of the previous code where you don't need to invert the mask:

    mask = df.notna().any(axis=1)
    df1 = df[mask]
    

    Same exact result.