python pandas dataframe one-hot-encoding

How to run get_dummies() function on multiple columns for the same category type?

I have features DataFrame that (let us say) looks like this:

Symptom A	Symptom B
Itching	Rash
Rash	Itching

When I run the get_dummies function on this dataframe, it will create four columns named 'Symptom_A_Itching', 'Symptom_A_Rash', 'Symptom_B_Rash', 'Symptom_B_Itching'. I don't want to treat the two values separately as it is being done with this function.

Is there any way to perform one hot encoding for this dataframe, where the values of both these columns won't be treated separately.

Basically, I want to get a DataFrame with columns 'Symptom_Itching', 'Symptom_Rash'.

I tried using the columns and prefix arguments in the get_dummies function, but that did not produce any results. I also tried setting all the Symptom column names to just 'Symptom' instead of 'Symptom_A', 'Symptom_B', but that also didn't work.

This is the code I have:

data_frame: DataFrame = read_csv('dataset.csv')
features: DataFrame = data_frame.iloc[:, 1:]
features.fillna('')
x: DataFrame = get_dummies(features)

Solution

stack, then get_dummies and groupby.max():

out = (df
   .stack().str.get_dummies()
   .groupby(level=0).max()
 )

Or using a trick to get all output columns with the same name and groupby.max() on axis=1:

out = (pd.get_dummies(df.rename(columns=lambda x: ''), prefix_sep='')
         .groupby(level=0, axis=1).max()
       )

Output:

   Itching  Rash
0        1     1
1        1     1