Search code examples
pandaslibreoffice-calc

calculate ratio of 2 parameters group by each server IP


I have a list of server IP addresses along ith the error code. The code starting with 2 or 3 is good and if it starts with 4 or 5 then it's bad.

            ip  code  value
0   10.0.1.125   200  86689
1   10.0.1.125   206     45
2   10.0.1.125   301     13
3   10.0.1.125   302    314
4   10.0.1.125   304   5115
5   10.0.1.125   400     16
6   10.0.1.125   401    212
7   10.0.1.125   403     26
8   10.0.1.125   404  12103
9   10.0.1.125   405     19
10  10.0.1.125   416      2
11  10.0.1.125   460    390
12  10.0.1.125   500   1176
13  10.0.1.125   502      1
14  10.0.1.125   503  26369

I need to calculate the percent of bad code count against the good. In this case it is 0.43 (40314 / 92176)

There are several such servers in the list and for each server the ratio should be calculated. The expected result is something like...

10.0.1.125 0.43
10.0.1.126 0.91
10.0.1.127 0.27

Solution

  • update

    I realize that the "value" column should be used, so update my previous approach to first map the code, then groupby.sum, finally compute the ratio:

    codes = {'2': 'good', '3': 'good', '4': 'bad', '5': 'bad'}
    
    out = (df.assign(status=df['code'].astype(str).str[0].map(codes))
             .groupby(['ip', 'status'])['value'].sum()
             .unstack('status')
             .eval('ratio = bad/good')
           )
    

    Or with pivot_table:

    out = (
     df.assign(status=df['code'].astype(str).str[0].map(codes))
       .pivot_table(index='ip', columns='status', values='value', aggfunc='sum')
       .eval('ratio = bad/good')
       .rename_axis(columns=None).reset_index()
    )
    

    Output:

               ip    bad   good     ratio
    0  10.0.1.125  40314  92176  0.437359
    

    previous answer (not the wanted logic)

    You can use a crosstab after a map to convert the codes to good/bad:

    # set up mapper for codes
    codes = {'2': 'good', '3': 'good', '4': 'bad', '5': 'bad'}
    
    # count the good/bad codes per ip
    tmp = pd.crosstab(df['ip'], df['code'].astype(str).str[0].map(codes))
    
    # compute the ratio
    tmp['bad']/tmp['good']
    

    NB. if the codes are integers and always 3 digits, you can also map with df['code'].floordiv(100).map(codes), just change the keys of d to be integers.

    Output:

    ip
    10.0.1.125    2.0
    dtype: float64
    

    Alternatively, with a groupby.sum:

    (df.assign(good=df['code'].astype(str).str.match('[23]'),
               bad=df['code'].astype(str).str.match('[45]'),
              )
       .groupby('ip', as_index=False)[['good', 'bad']].sum()
       .eval('ratio = bad/good')
    )
    

    Output:

               ip  good  bad  ratio
    0  10.0.1.125     5   10    2.0