I have a table like the one below.
A | B | C |
---|---|---|
1.0 | 1.5 | Baseline |
NaN | NaN | Baseline |
2.0 | NaN | Baseline |
I want to insert a column after summing the first two columns, and want a result like below.
A | B | Sum | C |
---|---|---|---|
1.0 | 1.5 | 2.5 | Baseline |
NaN | NaN | NaN | Baseline |
2.0 | NaN | 2.0 | Baseline |
The code I'm trying out is, df.insert(2, 'Sum', df[['A', 'B']].sum(axis=1))
But I'm getting the below result.
A | B | Sum | C |
---|---|---|---|
1.0 | 1.5 | 2.5 | Baseline |
NaN | NaN | 0.0 | Baseline |
2.0 | NaN | 2.0 | Baseline |
I would like to keep the sum of NaNs as a NaN in the sum column. Is there any short way to do this?
Some posts suggest, I can do (df['A'].fillna('') + df['B'].fillna('')).replace('', np.nan)). But float columns do not allow replacing Nan by ''. And I have multiple columns to sum and want to avoid a long formula. Will be grateful for any suggestion!
You can use np.where
:
res = np.where(df[['A', 'B']].notna().any(axis=1),
df[['A', 'B']].sum(axis=1), np.NaN)
df.insert(2, 'Sum', res)
Output:
>>> df
A B Sum C
0 1.0 1.5 2.5 Baseline
1 NaN NaN NaN Baseline
2 2.0 NaN 2.0 Baseline
Without numpy, you can do:
res = df[['A', 'B']].sum(axis=1)[df[['A', 'B']].notna().any(axis=1)]
df.insert(2, 'Sum', res)
Another way:
res = df[['A', 'B']].fillna(0, limit=1, axis=1).sum(axis=1, skipna=False)