Search code examples
pythonpandasnotnull

Find row index of pandas dataframe that don't have finite values


I have a large dataframe that I want to split when all columns are nan or don't have a finite value. I am looking for something similar to the post Drop rows of pandas dataframe that don't have finite values in certain variable(s) but rather than dropping I'd like to split on those rows.

I am currently on pandas 0.16.0


Solution

  • It'll be quicker to filter the non-NaN rows from your df by calling index.difference on the index labels returned from dropna:

    In [69]:
    df = pd.DataFrame({'a':[0,np.NaN, 0], 'b':[np.NaN, np.NaN, 1]})
    df = pd.concat([df]*10000, ignore_index=True)   
    
    %timeit df[df.apply(lambda x: x.isnull().all(), axis=1)]
    %timeit df.loc[df.index.difference(df.dropna(how='all').index)]
    
    1 loops, best of 3: 2.82 s per loop
    100 loops, best of 3: 8.95 ms per loop
    

    You can see that for a 30k row df, the latter method is much faster