Search code examples
sentiment-analysistext-analysisgraphlab

Graphlab: How to avoid manually duplicating functions that has only a different string variable?


I imported my dataset with SFrame:

products = graphlab.SFrame('amazon_baby.gl')
products['word_count'] = graphlab.text_analytics.count_words(products['review'])

I would like to do sentiment analysis on a set of words shown below:

selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']

Then I would like to create a new column for each of the selected words in the products matrix and the entry is the number of times such word occurs, so I created a function for the word "awesome":

def awesome_count(word_count):
    if 'awesome' in product:
        return product['awesome']
    else:
        return 0;

products['awesome'] = products['word_count'].apply(awesome_count)

so far so good, but I need to manually create other functions for each of the selected words in this way, e.g., great_count, etc. How to avoid this manual effort and write cleaner code?


Solution

  • I actually find out an easier way do do this:

    def wordCount_select(wc,selectedWord):
        if selectedWord in wc:
            return wc[selectedWord]
        else:
            return 0    
    
    
    for word in selected_words:
        products[word] = products['word_count'].apply(lambda wc: wordCount_select(wc, word))