Search code examples
pythontensorflowofflinenamed-entity-recognitionbert-language-model

how to use ktrain for NER Offline?


I have trained my English model following this notebook (https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-06-sequence-tagging.ipynb). I am able to save my pretrained model and run it with no problem.

However, I need to run it again but OFFLINE and it is not working, I understand that I need to download the file and do something similar to what is done here.

https://github.com/huggingface/transformers/issues/136

However, I am not able to understand where do I need to change the settings of ktrain.

I run this:

ktrain.load_predictor('Functions/my_english_nermodel')

and this is the error I get:

Traceback (most recent call last):
  File "Z:\Functions\NER.py", line 155, in load_bert
    reloaded_predictor= ktrain.load_predictor('Z:/Functions/my_english_nermodel')
  File "C:\Program Files\Python37\lib\site-packages\ktrain\core.py", line 1316, in load_predictor
    preproc = pickle.load(f)
  File "C:\Program Files\Python37\lib\site-packages\ktrain\text\ner\anago\preprocessing.py", line 76, in __setstate__
    if self.te_model is not None: self.activate_transformer(self.te_model, layers=self.te_layers)
  File "C:\Program Files\Python37\lib\site-packages\ktrain\text\ner\anago\preprocessing.py", line 100, in activate_transformer
    self.te = TransformerEmbedding(model_name, layers=layers)
  File "C:\Program Files\Python37\lib\site-packages\ktrain\text\preprocessor.py", line 1095, in __init__
    self.tokenizer = self.tokenizer_type.from_pretrained(model_name)
  File "C:\Program Files\Python37\lib\site-packages\transformers\tokenization_utils.py", line 903, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\transformers\tokenization_utils.py", line 1008, in _from_pretrained
    list(cls.vocab_files_names.values()),
OSError: Model name 'bert-base-uncased' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed 'bert-base-dutch-cased' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

Process finished with exit code 1

Solution

  • More generally, the transformers-based pretrained models are downloaded to <home_directory>/.cache/torch/transformers. For instance, on Linux, this will be /home/<user_name>/.cache/torch/transformers.

    As indicated in the answer above, to reload the ktrain predictor on a machine with no internet access (for ktrain models that utilize models from transformers library), you'll need copy the model files in that folder to the same location on the new machine.