Search code examples
pythonamazon-sagemakerllama3.1

ValueError:'rope_scaling' must be a dictionary with two fields, 'type' and 'factor'


When training llama3.1-8B-Instruct model on Amazon sagemaker, the training job fails with the following output:

./usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` 
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "/workspace/train.py", line 85, in <module>
    main()
  File "/workspace/train.py", line 48, in main
    config = AutoConfig.from_pretrained(model_name, token=use_auth_token)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1124, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 764, in from_dict
    config = cls(**config_dict)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/configuration_llama.py", line 160, in __init__
    self._rope_scaling_validation()
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/configuration_llama.py", line 180, in _rope_scaling_validation
    raise ValueError(
ValueError: `rope_scaling` must be a dictionary with with two fields, `type` and `factor`, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

I've tried changing the config.rope_scaling and applying it to the model but it doesn't work. This is the code snippet where I change the config:

# Load model configuration
config = AutoConfig.from_pretrained(model_name, token=use_auth_token)

# Modify the rope_scaling config
config.rope_scaling = {
    "type": "llama3",
    "factor": 8.0
}

# Initialize the model with the modified config
model = LlamaForCausalLM.from_pretrained(modek_name, token=use_auth_token, config=config)

Solution

  • I just faced the same error using Llama 3.1 with transformers-4.41.0. It was resolved by upgrading:

    pip install --upgrade transformers
    

    With transformers-4.44.2 everything runs perfectly. See the related GitHub issue for more details.