Search code examples
pythonpandasdataframedummy-variable

How to specify which column to remove in get_dummies in pandas


I have a DataFrame column with 3 values - Bart, Peg, Human. I need to one-hot encode them such that Bart and Peg stay as columns and human is represented as 0 0.

Xi | Architecture
0  | Bart
1  | Bart
2  | Peg
3  | Human
4  | Human
5  | Peg
..
.

I want to one-hot encode them so that Human is represented as 0 0:

Xi |Bart| Peg
0  | 1  | 0
1  | 1  | 0
2  | 0  | 1
3  | 0  | 0
4  | 0  | 0
5  | 0  | 1

But when I do :

pd.get_dummies(df['Architecture'], drop_first = True)

it removes "Bart" and keeps the other 2. Is there a way to specify which column to remove?


Solution

  • IIUC, try use get_dummies then drop 'Human' column:

    df['Architecture'].str.get_dummies().drop('Human', axis=1)
    

    Output:

       Bart  Peg
    0     1    0
    1     1    0
    2     0    1
    3     0    0
    4     0    0
    5     0    1