pytorch huggingface-transformers language-model

RuntimeError: CUDA error: device-side assert triggered - BART model

I am trying to run BART language model for a text generation task.

My code was working fine when I used for another encoder-decoder model (T5), but with bart I am getting this error:

File "train_bart.py", line 89, in train
    outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, labels=lm_labels)                                                     cs-lab-host1" 12:39 10-Aug-21
  File ".../venv/tf_23/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 1308, in forward
    return_dict=return_dict,
  File ".../venv/tf_23/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 1196, in forward
    return_dict=return_dict,
  File ".../venv/tf_23/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 985, in forward
    attention_mask, input_shape, inputs_embeds, past_key_values_length
  File ".../venv/tf_23/lib/python3.6/site-packages/transformers/models/bart/modeling_bart.py", line 866, in _prepare_decoder_attent
ion_mask
    ).to(self.device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

And this is where error happens:

for _, data in tqdm(enumerate(loader, 0), total=len(loader), desc='Processing batches..'):
    y = data['target_ids'].to(device, dtype = torch.long)
    y_ids = y[:, :-1].contiguous()
    lm_labels = y[:, 1:].clone().detach()
    lm_labels[y[:, 1:] == tokenizer.pad_token_id] = -100
    ids = data['source_ids'].to(device, dtype = torch.long)
    mask = data['source_mask'].to(device, dtype = torch.long)

    outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, labels=lm_labels)
    loss = outputs[0]

loader is the tokenized and processed data.

Solution

After fighting for many hours, I found that the error was due to adding new tokens to the Bart tokenizer. Thus I needed to resize the model input embeddings matrix:

model.resize_token_embeddings(len(tokenizer))

The point that is still not clear to me is that, without resizing the embeddings matrix, I was able to fine-tune T5 model without any problem, but not Bart.

Maybe this is because Bart is sharing weights between the input and the output layers (I am not sure of this either).