How can I quickly sum over a Pandas groupby object while handling NaNs?

I have a DataFrame with key and value columns. value is sometimes NA:

df = pd.DataFrame({
    'key': np.random.randint(0, 1_000_000, 100_000_000),
    'value': np.random.randint(0, 1_000, 100_000_000).astype(float),
})
    
df.loc[df.value == 0, 'value'] = np.nan

I want to group by key and sum over the value column. If any value is NA for a key, I want the sum to be NA.

The code in this answer takes 35.7 seconds on my machine:

df.groupby('key')['value'].apply(np.array).apply(np.sum)

This is a lot slower than what is theoretically possible. The built-in Pandas SeriesGroupBy.sum takes 6.31 seconds on my machine:

df.groupby('key')['value'].sum()

but it doesn't support NA handling (see this GitHub issue).

What code can I write to get comparable performance to the built-in operator while still handling NaNs?

Solution

One workaround could be to replace the NaNs by Inf:

df.fillna({'value': np.inf}).groupby('key')['value'].sum().replace(np.inf, np.nan)

Faster alternative:

df['value'].fillna(np.inf).groupby(df['key']).sum().replace(np.inf, np.nan)

Example output:

key
0        45208.0
1            NaN
2        62754.0
3        50001.0
4        51073.0
          ...   
99995    55102.0
99996    43048.0
99997    49497.0
99998    43301.0
99999        NaN
Name: value, Length: 100000, dtype: float64

Timing (on 10m rows).

# original sum
743 ms ± 81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Inf workaround
918 ms ± 70.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Inf workaround (alternative)
773 ms ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# custom apply with numpy
5.99 s ± 263 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)