I have build a model with gensim library and am trying to get the vector of word that not present in the vocabulary but i have an error, and i want to handle this error with the best i way. If i can get the vector of word not present in the model that well be perfect.
The code
model = KeyedVectors.load('nice.model')
token_vector = model.wv['bla bla bla']
Error
File "/home/ahmed/PycharmProjects/WebScarping/venv/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 421, in get_index
raise KeyError(f"Key '{key}' not present")
KeyError: "Key 'hmed' not present"
please help me in resolving the error
If the token is not present in the model, it can't give you a vector for it.
Your model doesn't have a vector for the (pseudo-)word 'bla bla bla'
, all it can do is report that.
You could avoid the exception by pre-checking whether the token is present, and only requesting it if present:
if token in model.wv:
token_vector = model.wv[token]
else:
# whatever your next-best step is when a vector not available
...
Or, you could catch the exception:
try:
token_vector = model.wv[token]
except KeyError:
# whatever your next-best step is when a vector not available
...
But there's no magic way to create a good vector for an unknown token. You'll have to ignore such words, or make-up some plug stand-in value, or figure some other project-appropriate workaround.
(If you have sufficient training data with varied examples of the token's real usage, you could train a model that includes the token. You could also consider finding or training a word2vec variant model like FastText
, which can synthesize guess-vectors for unknown tokens based on which substrings they might share with words learned in training – but such vectors may be quite poor in quality.)