I am trying to build a pipeline in HuggingFace which will not use the positional embeddings in BERT, in order to study the role of the embeddings for a particular use case. I have looked through the documentation and the code, but I have not been able to find a way to implement a model like that. Will I need to modify BERT source code, or is there a configuration I can fiddle around with?
You can do a workaround by setting the position embedding layer to zeros. When you check, the embeddings part of BERT, you can see that the position embeddings are there as a separate PyTorch module:
from transformers import AutoModel
bert = AutoModel.from_pretrained("bert-base-cased")
print(bert.embeddings)
BertEmbeddings(
(word_embeddings): Embedding(28996, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
You can assign the position embedding parameters whatever value you want, including zeros, which will effectively disable the position embeddings:
bert.embeddings.position_embeddings.weight.data = torch.zeros((512, 768))
If you plan to fine-tune the modified model, make sure the zeroed parameters do not get updated by setting:
bert.embeddings.position_embeddings.requires_grad_ = False
This sort of bypassing the position embeddings might work well when you train a model from scratch. When you work with a pre-trained model, such removal of some parameters might confuse the models quite a bit, so more fine-tuning data might be needed. In this case, there might be better strategies on how to replace the position embeddings, e.g., using the average value for all positions.