So I am finetuning a pretrained LLaMa2 model. I want to check if the model that I have finetuned is different from the original. I want to check the difference between base_model and model. Is there a way to check if there is a difference in weights or parameters after training?
from google.colab import drive
# Mount Google Drive
drive.mount('/content/drive')
# Path to your saved model in Google Drive
model_path_in_drive = '/content/drive/MyDrive/Mod/llama-2-7b-miniguanaco'
# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map=device_map,
)
# Load your PeftModel from the saved checkpoint in Google Drive
model = PeftModel.from_pretrained(base_model, model_path_in_drive)
model = model.merge_and_unload()
#mark_only_lora_as_trainable(lora_model)
# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
I tried some methods in this article but it didn't help me at all.
Just loop over the parameters and compare them with torch.allclose. I used DistilBertModel for the answer, please use the respective classes from your example that are also mentioned in the comments:
import torch
from transformers import DistilBertModel
# AutoModelForCausalLM in your case
base_model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
# PeftModel.merge_and_unload() in your case
finetuned_model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
for base_param, finetuned_param in zip(base_model.named_parameters(), finetuned_model.named_parameters()):
if not torch.allclose(base_param[1], finetuned_param[1]):
print(base_param[0])
Output:
embeddings.word_embeddings.weight
embeddings.position_embeddings.weight
embeddings.LayerNorm.weight
embeddings.LayerNorm.bias
transformer.layer.0.attention.q_lin.weight
transformer.layer.0.attention.q_lin.bias
transformer.layer.0.attention.k_lin.weight
transformer.layer.0.attention.k_lin.bias
transformer.layer.0.attention.v_lin.weight
transformer.layer.0.attention.v_lin.bias
transformer.layer.0.attention.out_lin.weight
transformer.layer.0.attention.out_lin.bias
transformer.layer.0.sa_layer_norm.weight
transformer.layer.0.sa_layer_norm.bias
transformer.layer.0.ffn.lin1.weight
transformer.layer.0.ffn.lin1.bias
transformer.layer.0.ffn.lin2.weight
transformer.layer.0.ffn.lin2.bias
transformer.layer.0.output_layer_norm.weight
transformer.layer.0.output_layer_norm.bias
...
transformer.layer.5.attention.q_lin.weight
transformer.layer.5.attention.q_lin.bias
transformer.layer.5.attention.k_lin.weight
transformer.layer.5.attention.k_lin.bias
transformer.layer.5.attention.v_lin.weight
transformer.layer.5.attention.v_lin.bias
transformer.layer.5.attention.out_lin.weight
transformer.layer.5.attention.out_lin.bias
transformer.layer.5.sa_layer_norm.weight
transformer.layer.5.sa_layer_norm.bias
transformer.layer.5.ffn.lin1.weight
transformer.layer.5.ffn.lin1.bias
transformer.layer.5.ffn.lin2.weight
transformer.layer.5.ffn.lin2.bias
transformer.layer.5.output_layer_norm.weight
transformer.layer.5.output_layer_norm.bias