Search code examples
pythonpandasnumpyconditional-statementsisin

Conditions in python that are not hard-coded


I am trying to find a way to represent conditions for np.where() other than from within the code. In my example below,

import pandas as pd
import numpy as np

file='insert path'
df = pd.read_csv(file)
df.loc[:, ['col_a','col_b']] = df.loc[:, ['col_a','col_b']].astype(str)
dfseg=df['col_a']+'-'+df['col_b']+'-'+df['col_c']

df['col_d'] = np.where((df['col_a']=='101')|(df['col_a']=='337')|(df['col_a']=='524'),dfseg,df['col_a'])

df

the conditions are that

a. df['col_a']=='101' or

b. df['col_a']=='337' or

c. df['col_a']=='524'

Is there any way these conditions could be represented external to the code? Or through another python method/function? Can tkinter extract this portion of the code and edit these conditions? Hoping to find a way these conditions could be reviewed and managed, etc. externally rather than within the script.

I have tried np.isin() and np.in1d but received False results:

print(np.in1d(df1['col_a'],df2['col_a']))
print(np.isin(df1['col_a'],df2['col_a']))

Output:

[False False False False False False]
[False False False False False False]

From the following:

df2: 
    col_a
0    101
1    524
2    337 
 df1: 
   col_a col_b col_c
0   101   104   AAB
1   337   103   CAD
2   524   204   DER
3   129   404   EEH
4   842   108   HHR
5   337   108   HHE 
 df2[col_a].isin df1[col_a] 
 0    False
1    False
2    False
3    False
4    False
5    False
Name: col_a, dtype: bool 
 df1[col_a].isin df2[col_a] 
 0    True
1    True
2    True
Name: col_a, dtype: bool

Converting df to numpy with

df1numpy = df1.to_numpy()

Similarly yielded all "False" matches with np.isin(). Thanks for your thoughts.


Solution

  • Turns out, after converting df to numpy, the array needs to be transposed as well:

    df1numpy1stcol = np.transpose(df1numpy)[0]
    

    (df2 does not need to be converted to numpy, updated below).

    Then, change datatype from object:

    df1numpy1stcol.astype(np.int32)
    

    Finally, the np.where() could be used with np.isin():

    df1['col_d'] = np.where(np.isin(df1numpy1stcol.astype(np.int32),df2),
                           dfseg,df1['col_a'])
    

    thus, df1 [output] as:

    col_a col_b col_c col_d
    0 101 104 AAB 101-104-AAB
    1 337 103 CAD 337-103-CAD
    2 524 204 DER 524-204-DER
    3 129 404 EEH 129
    4 842 108 HHR 842
    5 337 108 HHE 337-108-HHE

    The exceptions list is now represented as an editable list named "df2".