I'm trying to simplify a feature called 'programName' by mapping the text into a new feature called 'programGrp' where instead of the low level names for each program such as Mech/Elec/Petroleum Engineering, I would group them into general groups such as STEM, Humanities, Life Sciences, etc.
Here's my attempt:
def fill_stem(df):
for i in df:
if df['programName'].str.contains('Engineering') | df['programName'].str.contains('Computer Science') | df['programName'].str.contains('Mathematics'):
df['programGrp'].loc[i] = 'STEM'
fill_stem(df)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
But when I run it, I get this error. Any tips on how best to go about this?
You can use this instead:
df.loc[df['programName'].str.contains('Engineering|Computer Science|Mathematics'), 'programName'] = 'STEM'
OR:
df['programName'] = np.where(df['programName'].str.contains('Engineering|Computer Science|Mathematics'), 'STEM', df['programName'])