Search code examples
pandasdataframemissing-data

How to partition a dataframe into dataframes with no missing and contiguous indices?


Suppose I have the following DataFrame

index  Value
  0     'A'
  1     'B'
  2    pd.NA
  3    pd.NA
  4     'C'
  5    pd.NA

Then, I'm looking for a function that returns the next two DataFrames:

First:

index  Value
  0     'A'
  1     'B'

Second:

index  Value
  4     'C'

Solution

  • You can build a mask of NAs (with isna), form groups of successive non-NAs with cumsum and split with groupby:

    m = df['Value'].isna()
    
    dfs = [g for _,g in df[~m].groupby(m.cumsum())]
    

    Output:

    [   index Value
     0      0     A
     1      1     B,
        index Value
     4      4     C]
    

    Intermediates:

       index Value   isna      ~  cumsum
    0      0     A  False   True       0
    1      1     B  False   True       0
    2      2   NaN   True  False       1
    3      3   NaN   True  False       2
    4      4     C  False   True       2
    5      5   NaN   True  False       3