pandas machine-learning categorical-data one-hot-encoding

One hot encoding for multi level categorical data-set

My Dataset is as following:

Symptoms (X) :: Condition (Y)
fever, headache, blindnes :: wagner syndrom
tooth pain,fever , sweet urine :: buri buri diseases
blindness,nose bleed,fever :: Taylor syndrome

where X are the features and Y are my labels. i would like to encode X into one-hot-encoding matrix. panda's get_dummies cant handle multiple values in one column but if i will split X into multiple columns i will lose the ability to encode the symptoms to the same one-hot matrix

any ideas?

Solution

You could do this with Sklearn CountVectoriser, each word is a column, row an observation. If you set the binary tag to true, for each row if the word is present it will be represented as a 1 for that row|column. Set binary to False and its the number of times that word is present in the sentence.

Pandas Price Analysis
How to save all Plotly express graphs created using a for loop in a PDF?
Searching a column for a substring that matches a value from another column
Groupby by sum of revenue and the corresponding highest contributing month - Pandas
TypeError: Cannot convert numpy.ndarray to numpy.ndarray
Select rows from DataFrame where ID count is greater than X
fill nearest value in a column when null of pandas data frame
How to make a new date column off of a integer representation using python polars?
How to avoid output into scrollable frames in jupyter notebook?
How and why does Python's built-in round() function work flawlessly with pandas?
how to Send dataframe as html table with font styling based on text value as a email attachment
Unable to concatenate dataframes in streamlit
Take min and max dates for a sequence along a column
Python function to calculate a median without mean in a dataframe
Ubuntu 22.04 syntax warning importing Pandas
Group by Number, different size groups
Find non-overlapping intervals within DNA coordinates
Is pd.get_dummies() updated in newer versions of Pandas making it default to Booleans (True/False) instead of (0/1)?
Issue with pulling the data with EIA API with Python
How to expand a single-index DataFrame to a multiindex DataFrame in an efficient way? (python, pandas)
Pandas monthly rolling operation
5 minute OHLC data to hourly at quarter past the hour
Cumulative subtraction in Pandas Dataframe?
Using json_normalize with pandas
Is there the equivalent of to_markdown to read data?
Create Pivot table and add additional columns from another dataframe
Hide axis label only, not entire axis, in Pandas plot
How can I subclass a Pandas DataFrame?
Modifing a Pandas Dataframe using Pivot Tables or Group By
Take cumsum of each row in polars