I'm experimenting with huggingface
transformers to finetune microsoft/layoutlmv2-base-uncased
through AutoModelForTokenClassification
on my custom dataset that is similar to FUNSD (pre-processed and normalized). After a few iterations of training I get this error :
Traceback (most recent call last):
File "layoutlmV2/train.py", line 137, in <module>
trainer.train()
File "..../lib/python3.8/site-packages/transformers/trainer.py", line 1409, in train
return inner_training_loop(
File "..../lib/python3.8/site-packages/transformers/trainer.py", line 1651, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "..../lib/python3.8/site-packages/transformers/trainer.py", line 2345, in training_step
loss = self.compute_loss(model, inputs)
File "..../lib/python3.8/site-packages/transformers/trainer.py", line 2377, in compute_loss
outputs = model(**inputs)
File "..../lib/python3.8/site-packages/torch/nn/modules/module.py", line 1131, in _call_impl
return forward_call(*input, **kwargs)
File "..../lib/python3.8/site-packages/transformers/models/layoutlmv2/modeling_layoutlmv2.py", line 1228, in forward
outputs = self.layoutlmv2(
File "..../lib/python3.8/site-packages/torch/nn/modules/module.py", line 1131, in _call_impl
return forward_call(*input, **kwargs)
File "..../lib/python3.8/site-packages/transformers/models/layoutlmv2/modeling_layoutlmv2.py", line 902, in forward
text_layout_emb = self._calc_text_embeddings(
File "..../lib/python3.8/site-packages/transformers/models/layoutlmv2/modeling_layoutlmv2.py", line 753, in _calc_text_embeddings
spatial_position_embeddings = self.embeddings._calc_spatial_position_embeddings(bbox)
File "..../lib/python3.8/site-packages/transformers/models/layoutlmv2/modeling_layoutlmv2.py", line 93, in _calc_spatial_position_embeddings
h_position_embeddings = self.h_position_embeddings(bbox[:, :, 3] - bbox[:, :, 1])
File "..../lib/python3.8/site-packages/torch/nn/modules/module.py", line 1131, in _call_impl
return forward_call(*input, **kwargs)
File "..../lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "..../lib/python3.8/site-packages/torch/nn/functional.py", line 2203, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
After further inspection (vocab size, bboxes, dimensions, classes...) I noticed that there's negative values inside the input tensor causing the error. While input tensors of successful previous iterations have unsigned integers only. These negative numbers are returned by _calc_spatial_position_embeddings(self, bbox)
in modeling_layoutlmv2.py
line 92 :
h_position_embeddings = self.h_position_embeddings(bbox[:, :, 3] - bbox[:, :, 1])
Example of the input tensor that triggers the error in torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
:
tensor([[ 0, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12,
12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 12, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 5, 5, 5, 5, 5, 5, -6, -6, -6, -6, -6, -6, 1, 1, 1, 1, 1,
5, 5, 5, 5, 5, 5, 7, 5, 7, 7, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0]])
After double checking the dataset and specifically the coordinates of the labels, I've found that some rows bbox
coordinates lead to zero width or height. Here's a simplified example:
x1, y1, x2, y2 = dataset_row["bbox"]
print((x2-x1 < 1) or (y2-y1 < 1)) #output is sometimes True
After removing these labels from the dataset, the issue was resolved.