Search code examples
pythonpandasnumpynlplemmatization

Python: Can I create a dummy based on search conditions in one column with text series?


I was wondering how I could create a dummy variable for the following condition: column 'lemmatised' contains at least two words from 'innovation_words'. Innovation_words is a list I defined myself:

innovation_words = ['community', 'local', 'charity', 'event', 'partner',
                'volunteering', 'plastic', 'surplusfood']

The lemmatised column looks like this (I'm fine changing the type or formatting if needed):

data to use for condition

So, if any observation includes for example local and plastic, I would like to have a dummy variable: 'innovation' = 1. Hope someone can help me with this. Some code I already tried:

conditions = [df_posts['lemmatised'].isin(innovation_words), 
          df_posts['lemmatised'].isin(innovation_words)]

dummy = [1,0]

df_posts['innovation'] = np.select(conditions, dummy)

Solution

  • Use from this code

    
    df['new']=df.lemmatised.map(lambda w: len([i for i in innovation_words if i in w])>1)
    
    

    just rename the variables