I'm trying to run the code 'transformers' version of this code to use the new pre-trained BERTweet model and I'm getting an error.
The following lines of code ran successfully in my Google Colab notebook:
!pip install fairseq
import fairseq
!pip install fastBPE
import fastBPE
# download the pre-trained BERTweet model zipped file
!wget https://public.vinai.io/BERTweet_base_fairseq.tar.gz
# unzip the pre-trained BERTweet model files
!tar -xzvf BERTweet_base_fairseq.tar.gz
!pip install transformers
import transformers
import torch
import argparse
from transformers import RobertaConfig
from transformers import RobertaModel
from fairseq.data.encoders.fastbpe import fastBPE
from fairseq.data import Dictionary
Then I tried to run the following code:
# Load model
config = RobertaConfig.from_pretrained(
"/Absolute-path-to/BERTweet_base_transformers/config.json"
)
BERTweet = RobertaModel.from_pretrained(
"/Absolute-path-to/BERTweet_base_transformers/model.bin",
config=config
)
...and an error was displayed:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
242 if resolved_config_file is None:
--> 243 raise EnvironmentError
244 config_dict = cls._dict_from_json_file(resolved_config_file)
OSError:
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
2 frames
/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
250 f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing a {CONFIG_NAME} file\n\n"
251 )
--> 252 raise EnvironmentError(msg)
253
254 except json.JSONDecodeError:
OSError: Can't load config for '/Absolute-path-to/BERTweet_base_transformers/config.json'. Make sure that:
- '/Absolute-path-to/BERTweet_base_transformers/config.json' is a correct model identifier listed on 'https://huggingface.co/models'
- or '/Absolute-path-to/BERTweet_base_transformers/config.json' is the correct path to a directory containing a config.json file
I'm guessing the issue is that I need to replace '/Absolute-path-to' with something else but if that's the case what should it be replaced with? It's likely a very simple answer and I feel stupid for asking but I need help.
First of all you have to download the proper package as described in the github readme:
!wget https://public.vinai.io/BERTweet_base_transformers.tar.gz
!tar -xzvf BERTweet_base_transformers.tar.gz
After that you can click on the directory icon (left side of your screen) and list the downloaded data:
Right click on BERTweet_base_transformers, choose copy path
and insert the content from your clipboard to your code:
config = RobertaConfig.from_pretrained(
"/content/BERTweet_base_transformers/config.json"
)
BERTweet = RobertaModel.from_pretrained(
"/content/BERTweet_base_transformers/model.bin",
config=config
)