Search code examples
pythonpandasnumpyone-hot-encoding

one hot encoding classification


I am having a CSV file like this

F1  |  F2  |  F3  |  F4  |  Label  

I used the get_dummies to change the label to a one-hot encoding representation, the data contains 3 different labels, so the file now looks like

F1  |  F2  |  F3  |  F4  |  Label1  |  Label2  |  Label3

let's say I want to use this data to train a machine learning model. I have to determine the features and label columns can I set it to:

Features, x = [0:3]
Labels, y = [4:6]

Is it right? I am thinking, by doing this way, maybe this could be understood as a multi-label problem since this is not! originally it was a multi-class classification.

Any help will be so much appreciated.


Solution

  • You can try iloc or with filter

    x = df.iloc[:, :4]
    y = df.iloc[:, 4:]
    
    # or
    
    x = df.filter(like='F')
    y = df.filter(like='Label')
    
    print(x)
    
       F1  F2  F3  F4
    0   1   2   3   4
    1   1   2   3   4
    2   1   2   3   4
    
    print(y)
    
      Label1 Label2 Label3
    0      x      y      z
    1      x      y      z
    2      x      y      z