Search code examples
pythonmachine-learningnlp

How to approach this NLP algorithm solution?


I have an algorithm I want to make, just not sure how to approach this concept. I have a bag of words taken from some amazon reviews (my data set) with a rating on said reviews of the product(0-5 star). My goal is to make an algorithm using these words to give a rating on reviews without one. How exactly could I approach this problem?

The solution I thought of is to first map out the words to the reviews and based on the rating of said review give the word that rating as a score then repeat this process for all the reviews with ratings then average out the score of the words with how many ratings had the words in it (I still need to figure out what to do when the word appears twice in the review). Then finally using the score i'll use the model on the reviews without ratings then average out that score based on how many words from the bag were used.

something like this:

review_one = {'food is nice': 4}
review_two = {'its alright': 3}
review_three = {'bad' : 1}
review_four = {'its nice': 3}

bag_of_words = ['bad', 'nice', 'alright', 'food']

trained_model = {'bad': 1, 'nice': 3.5, 'alright': 3, 'food': 4}

test_review = "service bad but food nice"

trained_model.predict(test_review)

output = test_review_rating = 2.83 #(1+3.5+4)/3

My solutions seems way too tedious so I wanted to if there is a better way to approach this or am I doing something completely different?


Solution

  • I would create a model as a class containing the bow (bag_of_words) and the weights associated. Also containing a method for training weights and other for predicting. There are many ways to do this. but one implementation could be:

    class Simple_npl_model():
        def __init__(self,bag_of_words,weights=None):
            self.bow = bag_of_words
            if weights is None:
                self.weights = len(bag_of_words)*[0]
    
        def train(self,train_dict): 
            #simple train method that trhows your result
            for idx,word in enumerate(self.bow):
                word_score = 0
                word_count = 0
                for key,val in train_dict.items():
                    if word in key:
                        word_count += 1
                        word_score += val
                if word_count :
                    self.weights[idx]=word_score/word_count
    
        def predict(self,phrase):
            #evaluation of phrase
            phrase_score = 0
            bow_count = 0
            for idx,word in enumerate(self.bow):
                if word in phrase:
                    phrase_score += self.weights[idx]
                    bow_count += 1
            if bow_count:
               return phrase_score/bow_count
            else:
                return 0
    
        @property
        def bow_dict(self):
            #property that shows your trained params
            return {word:score for word,score in zip(self.bow,self.weights)}
    

    Then you can create an instance and train with your data. For ease the work, I put all the reviews in one dict.

    review_one = {'food is nice': 4}
    review_two = {'its alright': 3}
    review_three = {'bad' : 1}
    review_four = {'its nice': 3}
    
    reviews  = {**review_one, **review_two, **review_three, **review_four}
    
    bag_of_words = ['bad', 'nice', 'alright', 'food']
    
    model= Simple_npl_model(bag_of_words)
    model.train(reviews)
    
    #trained_model = {'bad': 1, 'nice': 3.5, 'alright': 3, 'food': 4}
    print(model.bow_dict)
    

    For last predict with the model

    #test
    test_review = "service bad but food nice"
    print(f"Review Rating: {model.predict(test_review)}")