I am working on my capstone project. Basically, I am trying to build a recommendation system for amazon beauty products. The dataset is a TensorFlow dataset.
data=tfds.load('amazon_us_reviews/Beauty_v1_00', split='train')
type: tensorflow.python.data.ops.dataset_ops.PrefetchDataset
Display some info about the features:
for sample in data.take(1).as_numpy_iterator():
pprint.pprint(sample)
Output
{'data': {'customer_id': b'18239070',
'helpful_votes': 0,
'marketplace': b'US',
'product_category': b'Beauty',
'product_id': b'B00LJ86MAY',
'product_parent': b'823234087',
'product_title': b'The Original Curly Tee Towel - T-Shirt Hair Dryi'
b'ng Towel Wrap (Extra Long)',
'review_body': b'Great product, quick ship and packaged nicely with a'
b'ttention to detail. Thank you!',
'review_date': b'2014-10-04',
'review_headline': b'Very pleased!',
'review_id': b'R24WHRN0BMM2K7',
'star_rating': 5,
'total_votes': 0,
'verified_purchase': 1,
'vine': 1}}
I try to select only some of the columns using the map function
data = data.map(lambda x: {
"customer_id": x["customer_id"],
"product_id": x["product_id"],
"star_rating": x["star_rating"]
})
KeyError: in user code:
KeyError: 'customer_id'
The code that is in the tutorial works fine but does not work when I try to do it. I have been googling and could not find an answer.
Do you have any suggestions? Thanks from now for your time.
You are missing the "data" key when accessing the dictionary.
This should fix it :
data = data.map(lambda x: {
"customer_id": x["data"]["customer_id"],
"product_id": x["data"]["product_id"],
"star_rating": x["data"]["star_rating"]
})