Search code examples
pandassklearn-pandas

Best way to search for 3 comparisons in a Bank Note dataset


So, I need to create a classifier with 3 simple comparisons to detect a fake bill, based on something like this pseudocode:

assume you are examining a bill with features f_1 ,f_2 ,f_3 and f_4 your rule may look like this :

if ( f_1 > 4) and ( f_2 > 8) and ( f_4 < 25):
  x = " good "
else :
  x = " fake "

What is best to use for this - a lambda? I started with this:

distdf = {
f1 : banknote['variance'] 
f2 : banknote['skewness'] 
f3 : banknote['curtosis']
f4 : banknote['entropy'] 
}

But I am not sure how to proceed. This is using the famous bank note authentication dataset: BankNote_Authentication.csv that can be found on Kaggle.


Solution

  • We can try with np.where to check all conditions and apply the corresponding labels. No need to alias columns to f1, f2, f3 etc:

    banknote_df['classifier'] = np.where(
        (banknote_df['variance'] > 4) &
        (banknote_df['skewness'] > 8) &
        (banknote_df['entropy'] < 25),
        'good',
        'fake'
    )
    

    Sample Program:

    import numpy as np
    import pandas as pd
    
    banknote_df = pd.DataFrame({
        'variance': [2.2156, 4.4795, 1.866, 3.47578, 0.697854],
        'skewness': [9.45647, 8.54688, -5.4568, 6.15258, -3.4564],
        'curtosis': [-1.12245, -1.2454, 2.75, -6.5468, 3.45875],
        'entropy': [-0.424514, -2.45687, 0.1230152, -6.1254, -0.45241],
        'class': [0, 0, 0, 0, 0]
    })
    
    banknote_df['classifier'] = np.where(
        (banknote_df['variance'] > 4) &
        (banknote_df['skewness'] > 8) &
        (banknote_df['entropy'] < 25),
        'good',
        'fake'
    )
    print(banknote_df)
    

    banknote_df:

       variance  skewness  curtosis   entropy  class classifier
    0  2.215600   9.45647  -1.12245 -0.424514      0       fake
    1  4.479500   8.54688  -1.24540 -2.456870      0       good
    2  1.866000  -5.45680   2.75000  0.123015      0       fake
    3  3.475780   6.15258  -6.54680 -6.125400      0       fake
    4  0.697854  -3.45640   3.45875 -0.452410      0       fake