I have a table and want to know the null counts but I get an unexpected error.
The table is below:
outage_id isp_name start_time end_time affected_area reported_issues
1 ISP D 2023-01-01 00:00:00 2023-01-01 01:00:00 Area 6 39
2 ISP E 2023-01-01 01:00:00 2023-01-01 05:00:00 Area 3 47
3 ISP C 2023-01-01 02:00:00 2023-01-01 03:00:00 Area 4 60
4 ISP E 2023-01-01 03:00:00 2023-01-01 04:00:00 Area 4 63
5 ISP E 2023-01-01 04:00:00 NULL Area 3 17
6 ISP B 2023-01-01 05:00:00 2023-01-01 06:00:00 Area 3 36
7 ISP C 2023-01-01 06:00:00 NULL Area 3 36
8 ISP C 2023-01-01 07:00:00 NULL Area 4 30
9 ISP C 2023-01-01 08:00:00 2023-01-01 09:00:00 Area 7 39
10 ISP E 2023-01-01 09:00:00 2023-01-01 12:00:00 Area 4 37
11 ISP D 2023-01-01 10:00:00 2023-01-01 11:00:00 Area 9 73
12 ISP C 2023-01-01 11:00:00 2023-01-01 13:00:00 Area 1 78
13 ISP E 2023-01-01 12:00:00 NULL Area 8 70
14 ISP B 2023-01-01 13:00:00 2023-01-01 17:00:00 Area 7 57
15 ISP D 2023-01-01 14:00:00 2023-01-01 15:00:00 Area 2 28
16 ISP B 2023-01-01 15:00:00 NULL Area 8 13
17 ISP D 2023-01-01 16:00:00 2023-01-01 20:00:00 Area 1 44
18 ISP E 2023-01-01 17:00:00 2023-01-01 21:00:00 Area 9 73
19 ISP A 2023-01-01 18:00:00 2023-01-01 21:00:00 Area 9 58
20 ISP D 2023-01-01 19:00:00 2023-01-01 22:00:00 Area 2 26
21 ISP B 2023-01-01 20:00:00 NULL Area 7 53
22 ISP E 2023-01-01 21:00:00 2023-01-01 22:00:00 Area 3 39
23 ISP D 2023-01-01 22:00:00 2023-01-02 01:00:00 Area 7 55
24 ISP A 2023-01-01 23:00:00 2023-01-02 02:00:00 Area 9 15
25 ISP A 2023-01-02 00:00:00 NULL Area 4 46
26 ISP C 2023-01-02 01:00:00 2023-01-02 03:00:00 Area 1 33
27 ISP C 2023-01-02 02:00:00 NULL Area 2 55
28 ISP B 2023-01-02 03:00:00 NULL Area 1 62
29 ISP D 2023-01-02 04:00:00 NULL Area 5 69
30 ISP D 2023-01-02 05:00:00 2023-01-02 09:00:00 Area 5 72
31 ISP C 2023-01-02 06:00:00 2023-01-02 10:00:00 Area 7 94
32 ISP D 2023-01-02 07:00:00 2023-01-02 10:00:00 Area 9 41
33 ISP D 2023-01-02 08:00:00 2023-01-02 11:00:00 Area 9 96
34 ISP A 2023-01-02 09:00:00 2023-01-02 13:00:00 Area 3 42
35 ISP C 2023-01-02 10:00:00 2023-01-02 13:00:00 Area 3 76
36 ISP E 2023-01-02 11:00:00 2023-01-02 13:00:00 Area 3 27
37 ISP C 2023-01-02 12:00:00 2023-01-02 14:00:00 Area 4 34
38 ISP E 2023-01-02 13:00:00 2023-01-02 14:00:00 Area 8 63
39 ISP A 2023-01-02 14:00:00 NULL Area 6 67
40 ISP B 2023-01-02 15:00:00 2023-01-02 19:00:00 Area 8 76
41 ISP D 2023-01-02 16:00:00 NULL Area 1 55
My codes are below,
``
df.end_time.isnull().groupby('isp_name').sum()
#or
df.end_time.isnull().groupby('isp_name').transform('sum')
``
It shows KeyError: 'isp_name'.
I do have the 'isp_name' column and I don't know why have this issue.
Thanks
After you sliced with df['end_time'].isnull()
you only have a Series and the isp_name
column is no longer available.
You need to groupby using a Series:
df['end_time'].isnull().groupby(df['isp_name']).sum()
Or use a custom aggregation function:
df.groupby('isp_name')['end_time'].agg(lambda x: x.isnull().sum())
Output:
isp_name
ISP A 2
ISP B 6
ISP C 3
ISP D 4
ISP E 3
Name: end_time, dtype: int64
Last option, for most flexibility, assign
a new column:
(df.assign(non_null=lambda d: d['end_time'].isnull())
.groupby('isp_name')['non_null'].sum()
)