I have a large dataframe that I want to split when all columns are nan or don't have a finite value. I am looking for something similar to the post Drop rows of pandas dataframe that don't have finite values in certain variable(s) but rather than dropping I'd like to split on those rows.
I am currently on pandas 0.16.0
It'll be quicker to filter the non-NaN
rows from your df by calling index.difference
on the index labels returned from dropna
:
In [69]:
df = pd.DataFrame({'a':[0,np.NaN, 0], 'b':[np.NaN, np.NaN, 1]})
df = pd.concat([df]*10000, ignore_index=True)
%timeit df[df.apply(lambda x: x.isnull().all(), axis=1)]
%timeit df.loc[df.index.difference(df.dropna(how='all').index)]
1 loops, best of 3: 2.82 s per loop
100 loops, best of 3: 8.95 ms per loop
You can see that for a 30k row df, the latter method is much faster