Afternoon All,
I run a script to create a crosstab style report from which calculations are added to the output, this assumes all fields used in the calculations are present. This assumption is incorrect, especially early in the month where a number of Results
are not yet present for each Client
. I would like to add some validation prior to the run.
There are no issues if and only if the following values are ALL present for field Result
for each Client
Covered
Customer Reject
Customer Timeout
Dealer Reject
Dealer Timeout
Done
No RFQ
Tied Covered
Tied Done
Tied Traded Away
Traded Away
The case where all values are present and the desired output is displayed:
df = {'Client': ['A', 'A', 'A',
'B', 'B', 'B', 'B','B','B','B','B',
'C',
'D','D','D','D','D','D','D','D','D','D','D'],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ',
'No RFQ',
'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}
df = pd.DataFrame.from_dict(df)
print(df)
Client Result
0 A Covered
1 A Customer Reject
2 A Customer Timeout
3 B Dealer Reject
4 B Dealer Timeout
5 B Done
6 B Tied Covered
7 B Tied Done
8 B Tied Traded Away
9 B Traded Away
10 B No RFQ
11 C No RFQ
12 D Covered
13 D Customer Reject
14 D Customer Timeout
15 D Dealer Reject
16 D Dealer Timeout
17 D Done
18 D Tied Covered
19 D Tied Done
20 D Tied Traded Away
21 D Traded Away
22 D No RFQ
df = pd.crosstab(df.Client,df.Result,margins=True, margins_name='Total_Result_Per_Client').drop('Total_Result_Per_Client')
# Percentage_Priced_Back_Count = (Total - Dealer Reject - Dealer_Timeout) / (Total - Customer Reject)
df['Percentage_Priced_Back_Count'] = ( df['Total_Result_Per_Client'] - df['Dealer Reject'] - df['Dealer Timeout'] ) / ( df['Total_Result_Per_Client'] - df['Customer Reject'] )
# Hit_Rate_Count = (Total – (Covered + Tied Covered) – Customer Reject – Dealer Reject – Dealer Timeout – (Traded Away + Tied Traded Away)) / (Total – Dealer Reject – Dealer Timeout)
df['Hit_Rate_Count'] = ( df['Total_Result_Per_Client'] - df['Covered'] - df['Tied Covered'] - df['Customer Reject'] - df['Dealer Reject'] - df['Dealer Timeout'] - df['Traded Away'] - df['Tied Traded Away']) / \
( df['Total_Result_Per_Client'] - df['Dealer Reject'] - df['Dealer Timeout'] )
df = df.stack().reset_index(name='value')
df2 = pd.DataFrame([[''] * len(df.columns), df.columns], columns=df.columns)
df1= (df.groupby('Client', group_keys=False)
.apply(lambda d: d.append(df2))
.iloc[:-2]
.reset_index(drop=True))
print (df1)
Sample output:
15 Client Result value
16 B Covered 0
17 B Customer Reject 0
18 B Customer Timeout 0
19 B Dealer Reject 1
20 B Dealer Timeout 1
21 B Done 1
22 B No RFQ 1
23 B Tied Covered 1
24 B Tied Done 1
25 B Tied Traded Away 1
26 B Traded Away 1
27 B Total_Result_Per_Client 8
28 B Percentage_Priced_Back_Count 0.75
29 B Hit_Rate_Count 0.5
The case where a value in Result
(used in the calculation) for a Client
is not present, an error is raised. In the example below Traded Away
is removed for Client
B
and D
. The error is raised in the pd.crosstab
call:
df = {'Client': ['A', 'A', 'A',
'B', 'B', 'B', 'B','B','B','B',
'C',
'D','D','D','D','D','D','D','D','D','D'],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ',
'No RFQ',
'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ']}
df = pd.DataFrame.from_dict(df)
print(df)
df = pd.crosstab(df.Client,df.Result,margins=True, margins_name='Total_Result_Per_Client').drop('Total_Result_Per_Client')
KeyError: 'Traded Away'
i.e. if the known Result
values are not present for any Client
add it (with a zero values) before the function call so all calculations can be completed.
I suggest use DataFrame.reindex
for add all missing values from list and fill them by 0
:
vals = ['Covered',
'Customer Reject',
'Customer Timeout',
'Dealer Reject',
'Dealer Timeout',
'Done',
'No RFQ',
'Tied Covered',
'Tied Done',
'Tied Traded Away',
'Traded Away']
df = (pd.crosstab(df.Client,
df.Result,
margins=True,
margins_name='Total_Result_Per_Client')
.drop('Total_Result_Per_Client')
.reindex(vals + ['Total_Result_Per_Client'], axis=1, fill_value=0))
print(df)
Result Covered Customer Reject Customer Timeout Dealer Reject \
Client
A 1 1 1 0
B 0 0 0 1
C 0 0 0 0
D 1 1 1 1
Result Dealer Timeout Done No RFQ Tied Covered Tied Done \
Client
A 0 0 0 0 0
B 1 1 1 1 1
C 0 0 1 0 0
D 1 1 1 1 1
Result Tied Traded Away Traded Away Total_Result_Per_Client
Client
A 0 0 3
B 1 0 7
C 0 0 1
D 1 0 10