Search code examples
pythontransformone-hot-encoding

OneHotEncoding a (categorical) column but with the value of another column of the Dataframe (not with value "1")


(my first question on StackOverFlow, so please be indulgent).

I am coding a ANN on a set of data containing among others the following columns:

[... , 'labels_column', 'Content %']

I would like to have the labels_column to be Encoded (like with a OneHotEncoder, which I am using now) to numeric, but would like the values to be the ones from column 'Content %' and not 1

For example:

labels_column Content %
label_1 37
label_2 24
label_3 12
label_2 60

Turned after the Transform into:

label_1 label_2 label_3
37 0 0
0 24 0
0 0 12
0 60 0

And not:

label_1 label_2 label_3 Content %
1 0 0 37
0 1 0 24
0 0 1 12
0 1 0 60

Haven't managed yet doing it with masks, or other tricks...

Thanks a lot for your help!


Solution

  • You could do a math/broadcasting trick:

    df = pd.DataFrame({'labels_column': ['label_1','label_2','label_3','label_2'],
                       'Content %': [37, 24, 12, 60]})
    
    pd.get_dummies(df['labels_column']) * df[['Content %']].values