Search code examples
pythonartificial-intelligencehuggingface-transformerspeft

I want to merge my PEFT adapter model with the base model and make a fully new model


As the title said, I want to merge my PEFT LoRA adapter model (ArcturusAI/Crystalline-1.1B-v23.12-tagger) that I trained before with the base model (TinyLlama/TinyLlama-1.1B-Chat-v0.6) and make a fully new model.

And I got this code from ChatGPT:

from transformers import AutoModel, AutoConfig

# Load the pretrained model and LoRA adapter
pretrained_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v0.6"

pretrained_model = AutoModel.from_pretrained(pretrained_model_name)
lora_adapter = AutoModel.from_pretrained("ArcturusAI/Crystalline-1.1B-v23.12-tagger")

# Assuming the models have the same architecture (encoder, decoder, etc.)
# Get the weights of each model
pretrained_weights = pretrained_model.state_dict()
lora_adapter_weights = lora_adapter.state_dict()

# Combine the weights (adjust the weights based on your preference)
combined_weights = {}
for key in pretrained_weights:
    combined_weights[key] = 0.8 * pretrained_weights[key] + 0.2 * lora_adapter_weights[key]

# Load the combined weights into the pretrained model
pretrained_model.load_state_dict(combined_weights)

# Save the integrated model
pretrained_model.save_pretrained("ArcturusAI/Crystalline-1.1B-v23.12-tagger-fullmodel")

And I got this error:

---------------------------------------------------------------------------

OSError                                   Traceback (most recent call last)

<ipython-input-1-d2120d727884> in <cell line: 6>()
      4 pretrained_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v0.6"
      5 pretrained_model = AutoModel.from_pretrained(pretrained_model_name)
----> 6 lora_adapter = AutoModel.from_pretrained("ArcturusAI/Crystalline-1.1B-v23.12-tagger")
      7 
      8 # Assuming the models have the same architecture (encoder, decoder, etc.)

1 frames

/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3096                             )
   3097                         else:
-> 3098                             raise EnvironmentError(
   3099                                 f"{pretrained_model_name_or_path} does not appear to have a file named"
   3100                                 f" {_add_variant(WEIGHTS_NAME, variant)}, {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME} or"

OSError: ArcturusAI/Crystalline-1.1B-v23.12-tagger does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

I have no idea what I did wrong there, I would appreciate it if anyone could teach me how to fix it, or am I going in a completely wrong direction? Thank you.

I tried using transformers and pytorch, I expect them to merge both models and create a new model out of it.


Solution

  • The adapter can't be loaded with AutoModel from transformers and also the suggestion from ChatGPT of merging won't work. Luckily you don't need to rely on AI for that. The peft library has everything ready for you with merge_and_unload:

    from peft import AutoPeftModelForCausalLM
    
    # Local path, check post scriptum for explanation
    model_id = "./ArcturusAI/Crystalline-1.1B-v23.12-tagger"
    peft_model = AutoPeftModelForCausalLM.from_pretrained(model_id)
    print(type(peft_model))
    
    merged_model = peft_model.merge_and_unload()
    # The adapters are merged now and it is transformers class again
    print(type(merged_model))
    

    Output:

    <class 'peft.peft_model.PeftModelForCausalLM'>
    <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>
    

    You can now save merged_model with save_pretrained or do with it whatever you want.

    Please note that this is only the model and not the tokenizer. You still need to load the tokenizer from the TinyLlama/TinyLlama-1.1B-Chat-v0.6 repo and save it with save_pretrained locally to have everything in one place:

    from transformers import AutoTokenizer
    t = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v0.6")
    

    P.S.: I noticed that you have trained the model with a different version of peft. Hence I downloaded it locally and removed the following keys from the adapter_config.json:

    • loftq_config,
    • megatron_config,
    • megatron_core

    to be able to load it with peft==0.6.2.