I am using the Language Interpretability Toolkit (LIT) to load and analyze a BERT model that I pre-trained on an NER task.
However, when I'm starting the LIT script with the path to my pre-trained model passed to it, it fails to initialize the weights and tells me:
modeling_utils.py:648] loading weights file bert_remote/examples/token-classification/Data/Models/results_21_03_04_cleaned_annotations/04.03._8_16_5e-5_cleaned_annotations/04-03-2021 (15.22.23)/pytorch_model.bin
modeling_utils.py:739] Weights of BertForTokenClassification not initialized from pretrained model: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
modeling_utils.py:745] Weights from pretrained model not used in BertForTokenClassification: ['bert.embeddings.position_ids']
It then simply uses the bert-base-german-cased
version of BERT, which of course doesn't have my custom labels and thus fails to predict anything. I think it might have to do with PyTorch, but I can't find the error.
If relevant, here is how I load my dataset into CoNLL 2003 format (modification of the dataloader scripts found here):
def __init__(self):
# Read ConLL Test Files
self._examples = []
data_path = "lit_remote/lit_nlp/examples/datasets/NER_Data"
with open(os.path.join(data_path, "test.txt"), "r", encoding="utf-8") as f:
lines = f.readlines()
for line in lines[:2000]:
if line != "\n":
token, label = line.split(" ")
self._examples.append({
'token': token,
'label': label,
})
else:
self._examples.append({
'token': "\n",
'label': "O"
})
def spec(self):
return {
'token': lit_types.Tokens(),
'label': lit_types.SequenceTags(align="token"),
}
And this is how I initialize the model and start the LIT server (modification of the simple_pytorch_demo.py
script found here):
def __init__(self, model_name_or_path):
self.tokenizer = transformers.AutoTokenizer.from_pretrained(
model_name_or_path)
model_config = transformers.AutoConfig.from_pretrained(
model_name_or_path,
num_labels=15, # FIXME CHANGE
output_hidden_states=True,
output_attentions=True,
)
# This is a just a regular PyTorch model.
self.model = _from_pretrained(
transformers.AutoModelForTokenClassification,
model_name_or_path,
config=model_config)
self.model.eval()
## Some omitted snippets here
def input_spec(self) -> lit_types.Spec:
return {
"token": lit_types.Tokens(),
"label": lit_types.SequenceTags(align="token")
}
def output_spec(self) -> lit_types.Spec:
return {
"tokens": lit_types.Tokens(),
"probas": lit_types.MulticlassPreds(parent="label", vocab=self.LABELS),
"cls_emb": lit_types.Embeddings()
This actually seems to be expected behaviour. In the documentation of the GPT models the HuggingFace team writes:
This will issue a warning about some of the pretrained weights not being used and some weights being randomly initialized. That’s because we are throwing away the pretraining head of the BERT model to replace it with a classification head which is randomly initialized.
So it seems to not be a problem for the fine-tuning. In my use case described above it worked despite the warning as well.