Search code examples
pandasmappingfeature-engineering

Feature Engineer Text Based on Text from Another Column in Pandas


I'm trying to simplify a feature called 'programName' by mapping the text into a new feature called 'programGrp' where instead of the low level names for each program such as Mech/Elec/Petroleum Engineering, I would group them into general groups such as STEM, Humanities, Life Sciences, etc.

Here's my attempt:

def fill_stem(df):
  for i in df:
    if df['programName'].str.contains('Engineering') | df['programName'].str.contains('Computer Science') | df['programName'].str.contains('Mathematics'):
      df['programGrp'].loc[i] = 'STEM'

fill_stem(df)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

But when I run it, I get this error. Any tips on how best to go about this?


Solution

  • You can use this instead:

    df.loc[df['programName'].str.contains('Engineering|Computer Science|Mathematics'), 'programName'] = 'STEM'
    

    OR:

    df['programName'] = np.where(df['programName'].str.contains('Engineering|Computer Science|Mathematics'), 'STEM', df['programName'])