Search code examples
pythonpandascomparetweets

Compare strings of a column in a dataframe with a set of words in a list


I have a dataframe with a single column full_text containing tweets and there is a list negative containing negative words. I want to create a new column that returns a boolean value if the negative words are found in the tweets as 1 and 0 if not found.


Solution

  • Ok, let's assume we have a dataframe data and list negative_words like this:

    data = pd.DataFrame({
        'Tweets' : ['This is bad', 'This is terrible', 'This is good', 'This is great'],
    })
    
    negative_words = ['bad', 'terrible']
    

    We can then do something like:

    1) We can use a lambda function with any:

    # create lambda with any:
    data['Negative'] = data.apply(lambda x: True if any(word in x.Tweets for word in negative_words) else False, axis=1)
    

    And will get:

                 Tweets  Negative
    0       This is bad      True
    1  This is terrible      True
    2      This is good     False
    3     This is great     False