Search code examples
pythontensorflowdictionarytensorflow-datasetskeyerror

Tensorflow tf.data.Dataset error when using map function | KeyError


I am working on my capstone project. Basically, I am trying to build a recommendation system for amazon beauty products. The dataset is a TensorFlow dataset.

Some Source code that works just fine

 data=tfds.load('amazon_us_reviews/Beauty_v1_00', split='train')

 type: tensorflow.python.data.ops.dataset_ops.PrefetchDataset
  • Display some info about the features:

    for sample in data.take(1).as_numpy_iterator():
    
    pprint.pprint(sample)
    
  • Output

      {'data': {'customer_id': b'18239070',
           'helpful_votes': 0,
           'marketplace': b'US',
           'product_category': b'Beauty',
           'product_id': b'B00LJ86MAY',
           'product_parent': b'823234087',
           'product_title': b'The Original Curly Tee Towel - T-Shirt Hair Dryi'
                        b'ng Towel Wrap (Extra Long)',
            'review_body': b'Great product, quick ship and packaged nicely with a'
                      b'ttention to detail. Thank you!',
            'review_date': b'2014-10-04',
            'review_headline': b'Very pleased!',
            'review_id': b'R24WHRN0BMM2K7',
            'star_rating': 5,
            'total_votes': 0,
            'verified_purchase': 1,
            'vine': 1}}
    

Error

I try to select only some of the columns using the map function

       data = data.map(lambda x: {
               "customer_id": x["customer_id"],
               "product_id": x["product_id"],
              "star_rating": x["star_rating"]
              })

KeyError: in user code:

       KeyError: 'customer_id'

The code that is in the tutorial works fine but does not work when I try to do it. I have been googling and could not find an answer.

Do you have any suggestions? Thanks from now for your time.


Solution

  • You are missing the "data" key when accessing the dictionary.

    This should fix it :

    data = data.map(lambda x: {
            "customer_id": x["data"]["customer_id"],
            "product_id": x["data"]["product_id"],
            "star_rating": x["data"]["star_rating"]
           })