So I am writing this dataset class. And in __get_item__
this is giving me this error:
> 0
> {}
>
> --------------------------------------------------------------------------- KeyError Traceback (most recent call
> last) <ipython-input-65-263240bbee7e> in <cell line: 1>()
> ----> 1 main()
>
> 3 frames <ipython-input-64-2b81d181af45> in __getitem__(self, idx)
> 12 print(idx)
> 13 print(self.dataset[idx])
> ---> 14 fname = os.path.join(self.data_dir+'/images/', self.dataset[idx]['image'])
> 15 #if os.path.exists(fname):
> 16 img = Image.open(fname).convert('RGB')
>
> KeyError: 'image'
>
>
One sample index of the dataset is:
defaultdict(<class 'dict'>, {939: {'image': 'COCO_train2014_000000163939.jpg', 'texts': {0: 'the right half of a keyboard'}}
class RefCOCOgDataset(Dataset):
def __init__(self, dataset, transform=None, data_dir='dataset/refcocog'):
super(RefCOCOgDataset, self).__init__()
self.data_dir = data_dir
self.transform = transform
self.dataset = dataset
print(self.dataset)
def __getitem__(self, idx):
data_item = {i: torch.tensor(v[idx]) for i, v in self.dataset.items() if idx in
self.dataset.keys()}
print(idx)
print(self.dataset[idx])
fname = os.path.join(self.data_dir+'/images/', self.dataset[idx]['image'])
#if os.path.exists(fname):
img = Image.open(fname).convert('RGB')
image = self.transform(img)['image']
data_item['image'] = image.permute(2, 0, 1).float()
data_item['texts'] = self.dataset[idx]['texts']
print(data_item[idx])
return data_item
def __len__(self):
return len(self.dataset)
My question is, how to map my dataset correctly so that it doesnt give me keyerror and the dataset gets loaded in the dataloader correctly? I want the idx
iterate through my dataset
dict (for example being idx=939) for correct mapping like the code I have written of image and texts. Is it possible? Sorry I am not clear on how this iteration of __get_item__
works. Would someone please shed some light onto this?
Thanks in advance.
You revealed a subset of what is happening, instead of offering a reprex.
Apparently some dataset ds
had a 939
key.
But up in a hidden portion of the call stack
you dereferenced ds[0]
.
def __getitem__(self, idx):
data_item = {i: torch.tensor(v[idx]) for i, v in self.dataset.items() if idx in
self.dataset.keys()}
print(idx)
print(self.dataset[idx])
So 0
is passed in, and it corresponds to an empty dict
.
The pair of prints report this, when they say 0
and {}
.
Given that, it is hardly surprising that
attempting to dereference self.dataset[0]['image']
would report KeyError.
Decompose your problem further, write one or more unit tests, and show us how the new code behaves in your use case. As presented, the specification is unclear.