Search code examples
machine-learningnlpartificial-intelligencebert-language-model

How to increase dimension-vector size of BERT sentence-transformers embedding


I am using sentence-transformers for semantic search but sometimes it does not understand the contextual meaning and returns wrong result eg. BERT problem with context/semantic search in italian language

by default the vector side of embedding of the sentence is 78 columns, so how do I increase that dimension so that it can understand the contextual meaning in deep.

code:

# Load the BERT Model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('bert-base-nli-mean-tokens')

# Setup a Corpus
# A corpus is a list with documents split by sentences.

sentences = ['Absence of sanity', 
             'Lack of saneness',
             'A man is eating food.',
             'A man is eating a piece of bread.',
             'The girl is carrying a baby.',
             'A man is riding a horse.',
             'A woman is playing violin.',
             'Two men pushed carts through the woods.',
             'A man is riding a white horse on an enclosed ground.',
             'A monkey is playing drums.',
             'A cheetah is running behind its prey.']

# Each sentence is encoded as a 1-D vector with 78 columns 
sentence_embeddings = model.encode(sentences) ### how to increase vector dimention 

print('Sample BERT embedding vector - length', len(sentence_embeddings[0]))

print('Sample BERT embedding vector - note includes negative values', sentence_embeddings[0])

Solution

  • Unfortunately the only way to INCREASE the dimension of the embedding in a meaningful way is retraining the model. :(

    However, maybe this is not what you need...maybe you should consider fine-tuning a model:

    I suggest you take a look at sentence-transformers from UKPLabs. They have pretrained models for sentence embedding for over 100 languages. The best part is that you can fine tune those models.

    Good Luck!