Search code examples
pandasnull

Pandas: Add null values in two columns to null


I have a table like the one below.

A B C
1.0 1.5 Baseline
NaN NaN Baseline
2.0 NaN Baseline

I want to insert a column after summing the first two columns, and want a result like below.

A B Sum C
1.0 1.5 2.5 Baseline
NaN NaN NaN Baseline
2.0 NaN 2.0 Baseline

The code I'm trying out is, df.insert(2, 'Sum', df[['A', 'B']].sum(axis=1))

But I'm getting the below result.

A B Sum C
1.0 1.5 2.5 Baseline
NaN NaN 0.0 Baseline
2.0 NaN 2.0 Baseline

I would like to keep the sum of NaNs as a NaN in the sum column. Is there any short way to do this?

Some posts suggest, I can do (df['A'].fillna('') + df['B'].fillna('')).replace('', np.nan)). But float columns do not allow replacing Nan by ''. And I have multiple columns to sum and want to avoid a long formula. Will be grateful for any suggestion!


Solution

  • You can use np.where:

    res = np.where(df[['A', 'B']].notna().any(axis=1),
                   df[['A', 'B']].sum(axis=1), np.NaN)
    df.insert(2, 'Sum', res)
    

    Output:

    >>> df
         A    B  Sum         C
    0  1.0  1.5  2.5  Baseline
    1  NaN  NaN  NaN  Baseline
    2  2.0  NaN  2.0  Baseline
    

    Without numpy, you can do:

    res = df[['A', 'B']].sum(axis=1)[df[['A', 'B']].notna().any(axis=1)]
    df.insert(2, 'Sum', res)
    

    Another way:

    res = df[['A', 'B']].fillna(0, limit=1, axis=1).sum(axis=1, skipna=False)