I am using sentence-transformers for semantic search but sometimes it does not understand the contextual meaning and returns wrong result eg. BERT problem with context/semantic search in italian language
by default the vector side of embedding of the sentence is 78 columns, so how do I increase that dimension so that it can understand the contextual meaning in deep.
code:
# Load the BERT Model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('bert-base-nli-mean-tokens')
# Setup a Corpus
# A corpus is a list with documents split by sentences.
sentences = ['Absence of sanity',
'Lack of saneness',
'A man is eating food.',
'A man is eating a piece of bread.',
'The girl is carrying a baby.',
'A man is riding a horse.',
'A woman is playing violin.',
'Two men pushed carts through the woods.',
'A man is riding a white horse on an enclosed ground.',
'A monkey is playing drums.',
'A cheetah is running behind its prey.']
# Each sentence is encoded as a 1-D vector with 78 columns
sentence_embeddings = model.encode(sentences) ### how to increase vector dimention
print('Sample BERT embedding vector - length', len(sentence_embeddings[0]))
print('Sample BERT embedding vector - note includes negative values', sentence_embeddings[0])
Unfortunately the only way to INCREASE the dimension of the embedding in a meaningful way is retraining the model. :(
However, maybe this is not what you need...maybe you should consider fine-tuning a model:
I suggest you take a look at sentence-transformers from UKPLabs. They have pretrained models for sentence embedding for over 100 languages. The best part is that you can fine tune those models.
Good Luck!