After adding BERT as a submodule, cannot use it, the version info is missing in the config file. These are the main steps:
1- I've used the git submodule add https://huggingface.co/bert-base-multilingual-uncased
command to add it as a submodule to my repos
2- I've put it in a directory whose name is: pretrained/mbert/
3- I've used the following code to use it:
from sentence_transformers import SentenceTransformer
def embed_text(sentences, pretrained="../pretrained/mbert/bert-base-multilingual-cased"):
"""
Computes the embeddings of the different sentences in input.
:param sentences: list, of sentences
:param pretrained: str, the pretrained bert model
:return: list, of list
"""
model = SentenceTransformer(pretrained)
sentence_embeddings = model.encode(sentences)
return [arr.tolist() for arr in sentence_embeddings]
I've got the following error:
model = SentenceTransformer(pretrained)
File "C:\ProgramData\Anaconda3\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 104, in __init__
if config['__version__'] > __version__:
KeyError: '__version__'
That cannot directly be used, the model download from huggingface.co. See this issue, the model folder frameworks are different between the trained PTM using transformer and trained ones using sentence-transformer.
For PTM trained using sentence-transformer,
The folder should consist these files:
0_Transformer/
1_Pooling/
config.json
modules.json