Search code examples
gitgit-submodulesbert-language-model

Error "version" not found after adding bert as a submodule to my git repo


After adding BERT as a submodule, cannot use it, the version info is missing in the config file. These are the main steps:

1- I've used the git submodule add https://huggingface.co/bert-base-multilingual-uncased command to add it as a submodule to my repos 2- I've put it in a directory whose name is: pretrained/mbert/ 3- I've used the following code to use it:

from sentence_transformers import SentenceTransformer


def embed_text(sentences, pretrained="../pretrained/mbert/bert-base-multilingual-cased"): 
    """
    Computes the embeddings of the different sentences in input.
    :param sentences: list, of sentences
    :param pretrained: str, the pretrained bert model
    :return: list, of list
    """

    model = SentenceTransformer(pretrained) 
    sentence_embeddings = model.encode(sentences)

    return [arr.tolist() for arr in sentence_embeddings]

I've got the following error:

model = SentenceTransformer(pretrained)  
  File "C:\ProgramData\Anaconda3\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 104, in __init__
    if config['__version__'] > __version__:
KeyError: '__version__'

Solution

  • That cannot directly be used, the model download from huggingface.co. See this issue, the model folder frameworks are different between the trained PTM using transformer and trained ones using sentence-transformer.

    For PTM trained using sentence-transformer,

    The folder should consist these files:
    0_Transformer/
    1_Pooling/
    config.json
    modules.json