One of my features is from a question in the form of "select all that apply". This means each entry has multiple values separated by commas like:
and so on. I need to convert this to numerical data so I can use it for my machine learning model. Something similar to what OneHotEncoder does. How do I handle this kind of data
EDIT:
Here is what I imagine the results to look like
You want Series.str.get_dummies
then use DataFrame.add_prefix
to get your desired column names:
df['Feature'].str.get_dummies(sep=',').add_prefix('feature_')
feature_option1 feature_option2 feature_option3 feature_option4
0 1 0 1 0
1 0 0 0 1
2 0 1 1 0