Search code examples
pythonpandasdataframenumpymissing-data

Parsing through a pandas Dataframe and applying rules based on different conditions


I have a made up dataframe to replicate the real problem I am trying to work through in python in reconciling account rates held on mainframe system to what they should be set to from rates tables.

I have 3 tables but they have been merged into one dataframe for this example.

  1. Account Information with Interest Rate Conditions (first 5 columns of df). These rates are the actual rates applied to the accounts and need to be matched off to ensure they were set up correctly
  2. Non Std Rates - Certain accounts will apply these non-std rates once certain conditions are met
  3. Std Rates- Same as above these will apply once certain conditions are met
import pandas as pd
import numpy as np
df = pd.DataFrame([[1234567890,3.5,'GG','N','N','Y',np.NaN,np.NaN,'N','N',3.5,'GG'],
                    [7854567890,np.NaN,'GG','N','N','N',np.NaN,'GG','N','N',3.5,'GG'],
                    [9876542190,3.5,'FF','N','N','Y',np.NaN,np.NaN,'N','Y',3.5,'FI'],
                    [9632587415,3.5,'GG','N','N','N',3,'GG','N','N',3.5,'GG']],
columns = ['Account','Account_Spread','Account_Swing','indict_1','indict_2','Negotiated_Rate',
           'Non_std_Spread','Non_std_Code','Non_std_indict_1','Non_std_indict_2','Std_Spread','Std_Swing'])
df

Conditions:

  • Account Data (Account Spread and Account Swing) should only be matched against the non-std rates if columns "Non_std_Spread" or "Non_std_Code" or both are populated and the "Negogiated_Rate" column is set to N.
  • Account Data (Spread and Swing) should only be matched against the std rates if columns "Non_std_Spread" or "Non_std_Code" are both null and the "Negogiated_Rate" column is set to N or Y.
  • for those accounts where the above indicator is set to Y, the indicator in the non-std data "Non_std_indict_1"and "Non_std_indict_2" need to be compared to "indict_1" and "indict_2" respectively and report matches and mismatches.

The Desired Outcome:

  • A new column added to dataframe identifying if a match or mismatch was detected comparing the account spread and code to either its equivalent in the non-std rates or std rates. Something like "MatchOnNSR" or "MismatchOnSR".
  • Another column or columns comparing if a mismatch occurred between the indicator columns when the Negogiated_Rate was flagged as Y

Sample Output with expected results:

df=pd.DataFrame([[1234567890,3.5,'GG','N','N','Y',np.NaN,np.NaN,'N','N',3.5,'GG','MatchOnSR',True,True],
                    [7854567890,np.NaN,'GG','N','N','N',np.NaN,'GG','N','N',3.5,'GG','MatchOnNSR',np.NaN,np.NaN],
                    [9876542190,3.5,'FF','N','N','Y',np.NaN,np.NaN,'N','Y',3.5,'FI','MismatchOnSR',True,False],
                    [9632587415,3.5,'GG','N','N','N',3,'GG','N','N',3.5,'GG','MismatchOnSNR',np.NaN,np.NaN]],
columns = ['Account','Account_Spread','Account_Swing','indict_1','indict_2','Negotiated_Rate',
           'Non_std_Spread','Non_std_Code','Non_std_indict_1','Non_std_indict_2','Std_Spread','Std_Swing','Is_Match','Match_indict_1','Match_indict_2'])
df

At present I don't have anything to share to work this problem out. I'm struggling to know what the best approach would be to get even started. Any help is much appreciated.


Solution

  • This answer is a response to some version related issues raised in my other one

    You could try using masks thereby 'cond_...', something like:

    cond_nan = df['Non_std_Spread'].isna() & df['Non_std_Code'].isna()
    
    df.loc[cond_nan,'match_indict_1'] = df.loc[cond_nan,'indict_1'] == df.loc[cond_nan,'Non_std_indict_1']
    df.loc[cond_nan,'match_indict_2'] = df.loc[cond_nan,'indict_2'] == df.loc[cond_nan,'Non_std_indict_2']
    df.loc[cond_nan,'Is_Match'] = np.where(
        (df.loc[cond_nan,'Account_Spread'] == df.loc[cond_nan,'Std_Spread']) & (df.loc[cond_nan,'Account_Swing'] == df.loc[cond_nan,'Std_Swing']),
        "MatchOnSR", "MismatchOnNSR",
    )
    
    
    cond_no_nan = ~df['Non_std_Spread'].isna() & ~df['Non_std_Code'].isna() & (df['Negotiated_Rate'] == 'N')
    
    df.loc[cond_no_nan,'Is_Match'] = np.where(
        (df.loc[cond_no_nan,'Account_Spread'] == df.loc[cond_no_nan,'Non_std_Spread']) & (df.loc[cond_no_nan,'Account_Swing'] == df.loc[cond_no_nan,'Non_std_Code']),
        "MatchOnSR", "MismatchOnNSR",
    )