Search code examples
pythonpandascumsum

How to calculate cumulative missing values group by an ID (in python)?


a) given the following "id" and "freq"

df = pd.DataFrame({'id':[1,1,1,1,1,1,2,2,2,2,2,3,3,3],'freq':[1,2,np.NaN, np.NaN, np.NaN, 6,7,8,9,10,np.NaN,np.NaN,13,14]})

df

b) how to calculate cumulative missing of "freq" group by "id"? with a reset to zero when freq > 0

so that the result 'cum_null' should look like - print(df(['cum_null']) 0 0 1 2 3 0 0 0 0 0 1 1 0 0

c) I've tried this. Very close, but cannot reset to zero when freq > 0

df['cum_null'] = id_grp['freq'].apply(lambda x:x.isnull().astype(int).cumsum())

df


Solution

  • If you case you can do groupby with mask

    df['cum_null'] = df.freq.isnull().groupby(df['id']).cumsum().where(df.freq.isnull(),0)
    0     0
    1     0
    2     1
    3     2
    4     3
    5     0
    6     0
    7     0
    8     0
    9     0
    10    1
    11    1
    12    0
    13    0
    Name: freq, dtype: int64