Can i go from 15 object variables to one final binary target variable?
Those 15 variables has ~10.000 different codes, my dataset is about 21.000.000 records. What im trying to do is at first replace the codes i want with 1 and the other with 0, then if one of fifteen variables is 1 the target variable will be 1, if all fifteen variables are 0 the target variable will be 0.
i have tried to work with to_replace, as_type, to_numeric, infer_objects with not good results,for example my dataset look like this head(5):
D P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
41234 1234 4367 874 NAN NAN NAN 789 NAN NAN NAN NAN NAN NAN NAN NAN
42345 7657 4367 874 NAN NAN NAN 789 NAN NAN NAN NAN NAN NAN NAN NAN
34212 7654 4347 474 NAN NAN NAN 789 NAN NAN NAN NAN NAN NAN NAN NAN
34212 8902 4317 374 NAN 452 NAN 719 NAN NAN NAN NAN NAN NAN NAN NAN
19374 2564 4387 274 NAN 452 NAN 799 NAN NAN NAN NAN NAN NAN NAN NAN
I want to transform all nan as 0, and selected codes with 1, so all the P1-P15 will be binary and the i will create a final P variable with them.
For example if P1-P15 have '3578','9732','4734'...(im using about 200 codes) i want to become 1. All the other values i want to become 0. D variable should stay as it is. The final dataset will be (D,P), then i will add the train variables
Any ideas? The following code gives me wrong results.
selCodes=['3722','66']
dfnew['P']=(dfnew.loc[:,'PR1':].astype(str).isin(selCodes).any(axis=1).astype(int))
Take a look at a test dataset(left), and new P(right).With the example code 3722 P should be 1.
IIUC, Use, DataFrame.isin
:
# example select codes
selCodes = ['1234', '9732', '719']
df['P'] = (
df.loc[:, 'P1':].astype(str)
.isin(selCodes).any(axis=1).astype(int)
)
df = df[['D', 'P']]
Result:
D P
0 41234 1
1 42345 0
2 34212 0
3 34212 1
4 19374 0