I'm learning Quantization, and am experimenting with Section 1 of this notebook.
I want to use this code on my own models.
Hypothetically, I only need to assign to model
variable in Section 1.2
# load model
model = BertForSequenceClassification.from_pretrained(configs.output_dir)
model.to(configs.device)
My models are from a different library: from transformers import pipeline
. So .to()
throws an AttributeError
.
My Model:
pip install transformers
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
model = unmasker("Hello I'm a [MASK] model.")
Output:
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
How might I run the linked Quantization code on my example model?
Please let me know if there's anything else I should clarify in this post.
The pipeline
approach won't work for Quantisation as we need the models to be returned. You can however, use pipeline
for testing the original models for timing etc.
Quantisation Code:
token_logits
contains the tensors of the quantised model.
You could place a for-loop
around this code, and replace model_name
with string
from a list
.
model_name = bert-base-uncased
tokenizer = AutoTokenizer.from_pretrained(model_name )
model = AutoModelForMaskedLM.from_pretrained(model_name)
sequence = "Distilled models are smaller than the models they mimic. Using them instead of the large " \
f"versions would help {tokenizer.mask_token} our carbon footprint."
inputs = tokenizer(sequence, return_tensors="pt")
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]
token_logits = model(**inputs).logits
# <- can stop here