I have a list of server IP addresses along ith the error code. The code starting with 2 or 3 is good and if it starts with 4 or 5 then it's bad.
ip code value
0 10.0.1.125 200 86689
1 10.0.1.125 206 45
2 10.0.1.125 301 13
3 10.0.1.125 302 314
4 10.0.1.125 304 5115
5 10.0.1.125 400 16
6 10.0.1.125 401 212
7 10.0.1.125 403 26
8 10.0.1.125 404 12103
9 10.0.1.125 405 19
10 10.0.1.125 416 2
11 10.0.1.125 460 390
12 10.0.1.125 500 1176
13 10.0.1.125 502 1
14 10.0.1.125 503 26369
I need to calculate the percent of bad code count against the good. In this case it is 0.43 (40314 / 92176)
There are several such servers in the list and for each server the ratio should be calculated. The expected result is something like...
10.0.1.125 0.43
10.0.1.126 0.91
10.0.1.127 0.27
I realize that the "value" column should be used, so update my previous approach to first map
the code, then groupby.sum
, finally compute the ratio:
codes = {'2': 'good', '3': 'good', '4': 'bad', '5': 'bad'}
out = (df.assign(status=df['code'].astype(str).str[0].map(codes))
.groupby(['ip', 'status'])['value'].sum()
.unstack('status')
.eval('ratio = bad/good')
)
Or with pivot_table
:
out = (
df.assign(status=df['code'].astype(str).str[0].map(codes))
.pivot_table(index='ip', columns='status', values='value', aggfunc='sum')
.eval('ratio = bad/good')
.rename_axis(columns=None).reset_index()
)
Output:
ip bad good ratio
0 10.0.1.125 40314 92176 0.437359
You can use a crosstab
after a map
to convert the codes to good/bad:
# set up mapper for codes
codes = {'2': 'good', '3': 'good', '4': 'bad', '5': 'bad'}
# count the good/bad codes per ip
tmp = pd.crosstab(df['ip'], df['code'].astype(str).str[0].map(codes))
# compute the ratio
tmp['bad']/tmp['good']
NB. if the codes are integers and always 3 digits, you can also map with df['code'].floordiv(100).map(codes)
, just change the keys of d
to be integers.
Output:
ip
10.0.1.125 2.0
dtype: float64
Alternatively, with a groupby.sum
:
(df.assign(good=df['code'].astype(str).str.match('[23]'),
bad=df['code'].astype(str).str.match('[45]'),
)
.groupby('ip', as_index=False)[['good', 'bad']].sum()
.eval('ratio = bad/good')
)
Output:
ip good bad ratio
0 10.0.1.125 5 10 2.0