i did some Llama 2 finetuning with autotrain, on google colab. this is a sample text column, for fine tuning
###Human:
Here is the OCR Text extracted from a VHS tape cover. Yes, the text is surely extracted from a VHS tape, but it may have some typos and wrong characters
hi fi EMBASSY HOME ENTERTAINMENT PSYCHIC KILLER A WHOLE NEW KIND OF MURDER Mardi Rustam presents PSYCHIC KILLER Starring PAUL BURKE JIM HUTTON JULIE ADAMS NEHEMIAH PERSOFF NEVILLE BRAND Co starring ALDO RAY WHIT BISSELL ROD CAMERON and DELLA REESE Executive producer MOHAMMED RUSTAM Written by GREYDON CLARK MIKE ANGEL RAYMOND DANTON Produced by MARDI RUSTAM Directed by RAYMOND DANTON An E Embassy Pictures Release
And here is the possibly matched database entry
title Psychic killer, distributor Embassy Home Entertainment, vhs release year 1975, crew Jim Hutton Paul Burke Neville Brand, synopsis A man wrongly accused of murder and locked in a mental institution seeks revenge through his newly learned psychic gifts His deadly revenge is terrifying Jim Hutton stars
Instructions for Matching VHS Tape Details to Database Entry
~*~SURELY~*~ Confirm if both the title and VHS home video distributor from the OCR text match the database entry.
~*~POSSIBLE~*~ Indicate if only the title matches, and you're uncertain about the distributor.
~*~IMPOSSIBLE~*~ Indicate if neither the title nor the distributor matches, or if the tape and distributor are clearly not the same.
Guidelines
Focus on the title of the movie, VHS home video distributor (not theatrical distributor), VHS release year (not theatrical release year), and cast and crew members.
Consider that the synopsis was manually added to the database and may have discrepancies, and some missing information such the crew.
Explain your reasoning briefly, optimized for ML data training enrichment, but avoid rewriting details of all possible matches.
###Assistant:
- ~*~SURELY~*~ The title of the movie "Psychic Killer" match both in OCR extracted text and database entry. The distributor "Embassy Home Entertainment" also matches in both the sources. Although the release year is not mentioned in the VHS tape, it matches the common names of the cast members such as Jim Hutton and Paul Burke. Therefore, it is a confirmed match.
this is how i do autotrain
!autotrain llm --train --project_name "llm-vhsfinetune-200s" \
--model abhishek/llama-2-7b-hf-small-shards \
--data_path . \
--text_column text \
--use_peft \
--use_int4 \
--learning_rate 2e-4 \
--train_batch_size 2 \
--num_train_epochs 1 \
--model_max_length 1024 \
--trainer sft \
--push_to_hub \
--repo_id "soajan/llm2-vhsfinetune-200s" \
--block_size 1024 > training.log
after training is done, trying to load & test the model from huggingface:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "soajan/llm2-vhsfinetune-200s"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})
but getting the below error message, stating
ValueError: The version of PEFT you are using is not compatible,
please use a version that is greater than 0.4.0
Full output & error trace:
Downloading (…)okenizer_config.json: 100%
705/705 [00:00<00:00, 52.3kB/s]
Downloading tokenizer.model: 100%
500k/500k [00:00<00:00, 507kB/s]
Downloading (…)/main/tokenizer.json: 100%
1.84M/1.84M [00:00<00:00, 3.73MB/s]
Downloading (…)in/added_tokens.json: 100%
21.0/21.0 [00:00<00:00, 1.21kB/s]
Downloading (…)cial_tokens_map.json: 100%
435/435 [00:00<00:00, 33.4kB/s]
Downloading (…)/adapter_config.json: 100%
458/458 [00:00<00:00, 35.5kB/s]
Loading checkpoint shards: 100%
10/10 [02:20<00:00, 10.89s/it]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-69-1fbd90a0393c> in <cell line: 13>()
11
12 tokenizer = AutoTokenizer.from_pretrained(model_id)
---> 13 model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})
3 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
533 elif type(config) in cls._model_mapping.keys():
534 model_class = _get_model_class(config, cls._model_mapping)
--> 535 return model_class.from_pretrained(
536 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
537 )
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3223
3224 if has_adapter_config:
-> 3225 model.load_adapter(
3226 adapter_model_id,
3227 adapter_name=adapter_name,
/usr/local/lib/python3.10/dist-packages/transformers/lib_integrations/peft/peft_mixin.py in load_adapter(self, peft_model_id, adapter_name, revision, token, device_map, max_memory, offload_folder, offload_index)
114 `offload_index` argument to be passed to `accelerate.dispatch_model` method.
115 """
--> 116 check_peft_version(min_version="0.4.0")
117
118 adapter_name = adapter_name if adapter_name is not None else "default"
/usr/local/lib/python3.10/dist-packages/transformers/utils/peft_utils.py in check_peft_version(min_version)
93
94 if not is_peft_version_compatible:
---> 95 raise ValueError(
96 f"The version of PEFT you are using is not compatible, please use a version that is greater"
97 f" than {min_version}"
ValueError: The version of PEFT you are using is not compatible, please use a version that is greater than 0.4.0
and i'm checking peft version, it is 0.5.0.dev0. Why may this be happening?
In Tranasformers.utils.peft_utils.py ,
is_peft_version_compatible = version.parse(importlib.metadata.version("peft")) <= version.parse(min_version)
The <=
should be >=
.