Search code examples
pythonpandasvalueerror

Division of sucessive columns by one another in a dataframe, subject to a condition


I am an using Python occasionnally and not very good at it. I am stuck with the following issue which I am sure is easy to solve... Let's assume we have the following 'df_initial' dataframe as an input.

print (df)
     Product       type    0    1    2    3    4    5    6
0   BURGUNDY     actual  645  600  720  640  500  320  300
1   BORDEAUX     actual  730  730  710  500  500  450  450
2  CHAMPAGNE     actual  320  260  280  100  100  100    0
3   BURGUNDY  objective  800  760  720  640  600  560  520
4   BORDEAUX  objective  750  730  710  690  670  550  630
5  CHAMPAGNE  objective  500  490  480    0    0    0    0

I create the following pivot table :

df_pivot = df_initial.pivot_table(index='Product', columns=['type'],
                                  aggfunc=np.sum)

enter image description here

I would like to calculate a ratio actual/objective for each product at each location 1, 2, 3, ..., 6 with the additional condition that the ratio should be set to 100% if the division is impossible (for any row where the value in the 'objective' column is equal to zero).

ratio_df = pd.DataFrame(index=df_pivot.index)

I define the following function :

def division(a, b):
    if b == 0:
        return 1
    else:
        return a/b

Then I try to apply the function :

for col in df_pivot.columns:
    if col[1] == 'objective':
        continue
    ratio_df[col[0]] = division (df_pivot[(col[0], 'actual')] , df_pivot[(col[0], 'objective')])

This returns :

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I understand why this cannot work but I do not know how to write the script correctly.

I expect a table like this :

enter image description here

I tried to change the function into :

def division(a, b):
    if b.values[0] == 0:
        return 1
    else:
        return a / b.values[0]

and then t:

ratio_df[col[0]] = df_pivot[col].apply(division, b=df_pivot.loc[:, (col[0], 'objective')])

but this does not produce the correct output.


Solution

  • You can use a groupby.sum, then slice with loc, replace the Inf by 1 and optionally format with applymap:

    tmp = df_initial.groupby(['type', 'Product']).sum()
    
    ratio_df = (tmp.loc['actual']
                .div(tmp.loc['objective'])
                .replace(np.inf, 1).fillna(1)
                .applymap('{:.0%}'.format) # optional (to format as %)
                )
    

    Or with your initial pivot_table with swaplevel for easier slicing:

    df_pivot = (df_initial
                .pivot_table(index='Product', columns='type', aggfunc='sum')
                .swaplevel(axis=1)
               )
    
    out = (df_pivot['actual']
           .div(df_pivot['objective'])
           .replace(np.inf, 1).fillna(1)
           .applymap('{:.0%}'.format)
          )
    

    NB. for recent pandas versions, replace applymap by map.

    Output:

                 0     1     2     3     4     5     6
    Product                                          
    BORDEAUX   97%  100%  100%   72%   75%   82%   71%
    BURGUNDY   81%   79%  100%  100%   83%   57%   58%
    CHAMPAGNE  64%   53%   58%  100%  100%  100%  100%