I am an using Python occasionnally and not very good at it. I am stuck with the following issue which I am sure is easy to solve... Let's assume we have the following 'df_initial' dataframe as an input.
print (df)
Product type 0 1 2 3 4 5 6
0 BURGUNDY actual 645 600 720 640 500 320 300
1 BORDEAUX actual 730 730 710 500 500 450 450
2 CHAMPAGNE actual 320 260 280 100 100 100 0
3 BURGUNDY objective 800 760 720 640 600 560 520
4 BORDEAUX objective 750 730 710 690 670 550 630
5 CHAMPAGNE objective 500 490 480 0 0 0 0
I create the following pivot table :
df_pivot = df_initial.pivot_table(index='Product', columns=['type'],
aggfunc=np.sum)
I would like to calculate a ratio actual/objective for each product at each location 1, 2, 3, ..., 6 with the additional condition that the ratio should be set to 100% if the division is impossible (for any row where the value in the 'objective' column is equal to zero).
ratio_df = pd.DataFrame(index=df_pivot.index)
I define the following function :
def division(a, b):
if b == 0:
return 1
else:
return a/b
Then I try to apply the function :
for col in df_pivot.columns:
if col[1] == 'objective':
continue
ratio_df[col[0]] = division (df_pivot[(col[0], 'actual')] , df_pivot[(col[0], 'objective')])
This returns :
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I understand why this cannot work but I do not know how to write the script correctly.
I expect a table like this :
I tried to change the function into :
def division(a, b):
if b.values[0] == 0:
return 1
else:
return a / b.values[0]
and then t:
ratio_df[col[0]] = df_pivot[col].apply(division, b=df_pivot.loc[:, (col[0], 'objective')])
but this does not produce the correct output.
You can use a groupby.sum
, then slice with loc
, replace
the Inf
by 1 and optionally format with applymap
:
tmp = df_initial.groupby(['type', 'Product']).sum()
ratio_df = (tmp.loc['actual']
.div(tmp.loc['objective'])
.replace(np.inf, 1).fillna(1)
.applymap('{:.0%}'.format) # optional (to format as %)
)
Or with your initial pivot_table
with swaplevel
for easier slicing:
df_pivot = (df_initial
.pivot_table(index='Product', columns='type', aggfunc='sum')
.swaplevel(axis=1)
)
out = (df_pivot['actual']
.div(df_pivot['objective'])
.replace(np.inf, 1).fillna(1)
.applymap('{:.0%}'.format)
)
NB. for recent pandas versions, replace applymap
by map
.
Output:
0 1 2 3 4 5 6
Product
BORDEAUX 97% 100% 100% 72% 75% 82% 71%
BURGUNDY 81% 79% 100% 100% 83% 57% 58%
CHAMPAGNE 64% 53% 58% 100% 100% 100% 100%