Search code examples
pythonpytorchdatasettorchvision

__get_item__ in Dataset class giving keyerror: Python


So I am writing this dataset class. And in __get_item__ this is giving me this error:

>   0
>   {}
> 
> --------------------------------------------------------------------------- KeyError                                  Traceback (most recent call
> last) <ipython-input-65-263240bbee7e> in <cell line: 1>()
> ----> 1 main()
> 
> 3 frames <ipython-input-64-2b81d181af45> in __getitem__(self, idx)
>      12       print(idx)
>      13       print(self.dataset[idx])
> ---> 14       fname = os.path.join(self.data_dir+'/images/', self.dataset[idx]['image'])
>      15       #if os.path.exists(fname):
>      16       img = Image.open(fname).convert('RGB')
> 
> KeyError: 'image'
> 
> 

One sample index of the dataset is:

defaultdict(<class 'dict'>, {939: {'image': 'COCO_train2014_000000163939.jpg', 'texts': {0: 'the right half of a keyboard'}}
class RefCOCOgDataset(Dataset):
 def __init__(self, dataset, transform=None, data_dir='dataset/refcocog'):
  super(RefCOCOgDataset, self).__init__()
  self.data_dir = data_dir
  self.transform = transform
  self.dataset = dataset
  print(self.dataset)


 def __getitem__(self, idx):  
  data_item = {i: torch.tensor(v[idx]) for i, v in self.dataset.items() if idx in 
  self.dataset.keys()}    
  print(idx)
  print(self.dataset[idx])
  fname = os.path.join(self.data_dir+'/images/', self.dataset[idx]['image'])
  #if os.path.exists(fname):
  img = Image.open(fname).convert('RGB')   
  image = self.transform(img)['image']
  data_item['image'] = image.permute(2, 0, 1).float()
  data_item['texts'] = self.dataset[idx]['texts']
  print(data_item[idx])
  
  return data_item


def __len__(self):
  return len(self.dataset)

My question is, how to map my dataset correctly so that it doesnt give me keyerror and the dataset gets loaded in the dataloader correctly? I want the idx iterate through my dataset dict (for example being idx=939) for correct mapping like the code I have written of image and texts. Is it possible? Sorry I am not clear on how this iteration of __get_item__ works. Would someone please shed some light onto this? Thanks in advance.


Solution

  • You revealed a subset of what is happening, instead of offering a reprex.

    Apparently some dataset ds had a 939 key. But up in a hidden portion of the call stack you dereferenced ds[0].

     def __getitem__(self, idx):  
      data_item = {i: torch.tensor(v[idx]) for i, v in self.dataset.items() if idx in 
      self.dataset.keys()}    
      print(idx)
      print(self.dataset[idx])
    

    So 0 is passed in, and it corresponds to an empty dict. The pair of prints report this, when they say 0 and {}.

    Given that, it is hardly surprising that attempting to dereference self.dataset[0]['image'] would report KeyError.


    Decompose your problem further, write one or more unit tests, and show us how the new code behaves in your use case. As presented, the specification is unclear.