My data:
Rank Platforms Technology
high Windows||Linux Unity
high Linux
low Windows Unreal
low Linux||MacOs GameMakerStudio||Unity||Unreal
low GameMakerStudio
low
I want to convert it to something like this:
Rank platform_Windows platform_linux platform_MacOs technology_unity technology_unreal technology_GameMakerStudio
high 1 0 0 1 0 1
high 0 1 0 0 0 0
low 1 0 0 0 1 0
low 0 1 1 1 1 1
low 0 0 0 0 0 1
low 0 0 0 0 0 0
So it's sort of one-hot encoding. I have followed many answers:
The issues are:
||
delimiterplatform_
and technology_
. I need this to know which original column the new column comes from.My current code is:
df.drop('Platforms', 1).join(
pd.get_dummies(
pd.DataFrame(df.Platforms.str.split("||").tolist()).stack(),
prefix=['platform']
).assum(level=0)
)
df.drop('Technology', 1).join(
pd.get_dummies(
pd.DataFrame(df.Technology.str.split("||").tolist()).stack(),
prefix=['technology']
).assum(level=0)
)
But the error I get is:
TypeError: object of type 'float' has no len()
I have read the document pandas.get_dummies and pandas.Series.str.get_dummies. The latter seems to accept a customized delimiter while the former allows customized new column prefixes...
You can do:
s = [df[col].str.get_dummies().add_prefix(f'{col.lower()}_')
for col in ['Platforms', 'Technology']]
pd.concat([df[['Rank']]] + s, axis=1)