Search code examples
pythonpandasdataframesumnan

Find the sum of values in rows of one column for where the other column has NAN in Pandas


I have a dataframe with columns A and B. Column A has non continuous data where some of the rows are NAN and B has continuous data. I would like to create a third column where for each set of A rows with NAN it will have the sum of values in those same rows in B + the next valid value in B. All other values in C should be NAN for NAN in A AND the value of B for rows following a valid number in A. Example:

data = {
    'A': [1, 1, None, None, 2, 5, None, None,3 ,4, 3, None , 5],
    'B': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130]}

Everything works fine except for the rows where I need the sum of B + next valid value in B. I use the following code. I have this code but is seems it's a mess by now.

`result = df.groupby(df['A'].isnull().cumsum())['B'].sum().reset_index()
df_result = pd.DataFrame({'C': result['Pumped']})
df_result.loc[1:, 'C'] -= result.loc[0, 'Pumped']

df.loc[~mask, 'C'] = df.loc[~mask, 'Pumped']

valid_rows_after_nan = df['dWL'].notnull() & mask.shift(1).fillna(False)


df.loc[valid_rows_after_nan, 'C'] = df_result

print(df)`

I would like the output to look like this:

`data = {
    'A': [1,  1, None, None, 2, 5, None, None,3 ,4, 3, None , 5],
    'B': [10, 20, 30,  40,  50, 60, 70, 80, 90, 100, 110, 120, 130],
    'C': [10, 20, None, None, 120, 60, None, None, 240, 100, 110, None, 5]
}

Solution

  • A simple version using groupby.transform:

    # identify the non-NA and reverse
    m = df.loc[::-1, 'A'].notna()
    
    # group the preceding NA, sum, mask where NA
    df['C'] = df.groupby(m.cumsum())['B'].transform('sum').where(m)
    

    Output:

          A    B      C
    0   1.0   10   10.0
    1   1.0   20   20.0
    2   NaN   30    NaN
    3   NaN   40    NaN
    4   2.0   50  120.0
    5   5.0   60   60.0
    6   NaN   70    NaN
    7   NaN   80    NaN
    8   3.0   90  240.0
    9   4.0  100  100.0
    10  3.0  110  110.0
    11  NaN  120    NaN
    12  5.0  130  250.0