Search code examples
pandasdataframenlp

Python KeyError when using pandas


I'm following a tutorial on NLP but have encountered a key error error when trying to group my raw data into good and bad reviews. Here is the tutorial link: https://towardsdatascience.com/detecting-bad-customer-reviews-with-nlp-d8b36134dc7e

#reviews.csv
I am so angry about the service
Nothing was wrong, all good
The bedroom was dirty
The food was great

#nlp.py
import pandas as pd

#read data
reviews_df = pd.read_csv("reviews.csv")
# append the positive and negative text reviews
reviews_df["review"] = reviews_df["Negative_Review"] + 
reviews_df["Positive_Review"]

reviews_df.columns

I'm seeing the following error:

File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Negative_Review'

Why is this happening?


Solution

  • You're getting this error because you did not understand how to structure your data.

    When you do df['reviews']=df['Positive_reviews']+df['Negative_reviews'] you're actually summing the values of Positive reviews to Negative reviews(which does not exist currently) into the 'reviews' column (chich also does not exist).

    Your csv is nothing more than a plaintext file with one text in each row. Also, since you're working with text, remember to enclose every string in quotation marks("), otherwise your commas will create fakecolumns.

    With your approach, it seems that you'll still tag all your reviews manually (usually, if you're working with machine learning, you'll do this outside code and load it to your machine learning file).

    In order for your code to work, you want to do the following:

    import pandas as pd
    
    df = pd.read_csv('TestFileFolder/57886076.csv', names=['text'])
    ## Fill with placeholder values
    df['Positive_review']=0
    df['Negative_review']=1
    df.head()
    

    Result:

                                  text  Positive_review  Negative_review
    0  I am so angry about the service                0                1
    1      Nothing was wrong, all good                0                1
    2            The bedroom was dirty                0                1
    3               The food was great                0                1
    

    However, I would recommend you to have a single column (is_review_positive) and have it to true or false. You can easily encode it later on.