I have trained my English model following this notebook (https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-06-sequence-tagging.ipynb). I am able to save my pretrained model and run it with no problem.
However, I need to run it again but OFFLINE and it is not working, I understand that I need to download the file and do something similar to what is done here.
https://github.com/huggingface/transformers/issues/136
However, I am not able to understand where do I need to change the settings of ktrain.
I run this:
ktrain.load_predictor('Functions/my_english_nermodel')
and this is the error I get:
Traceback (most recent call last):
File "Z:\Functions\NER.py", line 155, in load_bert
reloaded_predictor= ktrain.load_predictor('Z:/Functions/my_english_nermodel')
File "C:\Program Files\Python37\lib\site-packages\ktrain\core.py", line 1316, in load_predictor
preproc = pickle.load(f)
File "C:\Program Files\Python37\lib\site-packages\ktrain\text\ner\anago\preprocessing.py", line 76, in __setstate__
if self.te_model is not None: self.activate_transformer(self.te_model, layers=self.te_layers)
File "C:\Program Files\Python37\lib\site-packages\ktrain\text\ner\anago\preprocessing.py", line 100, in activate_transformer
self.te = TransformerEmbedding(model_name, layers=layers)
File "C:\Program Files\Python37\lib\site-packages\ktrain\text\preprocessor.py", line 1095, in __init__
self.tokenizer = self.tokenizer_type.from_pretrained(model_name)
File "C:\Program Files\Python37\lib\site-packages\transformers\tokenization_utils.py", line 903, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "C:\Program Files\Python37\lib\site-packages\transformers\tokenization_utils.py", line 1008, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name 'bert-base-uncased' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed 'bert-base-dutch-cased' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
Process finished with exit code 1
More generally, the transformers-based pretrained models are downloaded to <home_directory>/.cache/torch/transformers
. For instance, on Linux, this will be /home/<user_name>/.cache/torch/transformers
.
As indicated in the answer above, to reload the ktrain predictor
on a machine with no internet access (for ktrain
models that utilize models from transformers
library), you'll need copy the model files in that folder to the same location on the new machine.