I need to change the matrix embedding in word2vec after to train this. Here is the example:
w2v=Word2Vec(sentences,size=100,window=1,min_count=1,negative=15,iter=3)
w2v.save("word2vec.model")
#Getting embedding matrix
embedding_matrix=w2v.wv.vectors
for p in ("mujer", "hombre"):
result=w2v.wv.similar_by_word(p)
print("Similar words from '",p,"': ",result[:3])
#Trying to set wights matrix
w2v.wv.vectors=np.random.rand(w2v.wv.vectors.shape[0],w2v.wv.vectors.shape[1])
print()
for p in ("mujer", "hombre"):
result=w2v.wv.similar_by_word(p)
print("Similar words from '",p,"': ",result[:3])
And here is the output:
Similar words from ' mujer ': [('honra', 0.9999152421951294), ('muerte', 0.9998959302902222), ('contento', 0.999891459941864)]
Similar words from ' hombre ': [('valor', 0.9999064207077026), ('nombre', 0.9998984336853027), ('llegar', 0.9998887181282043)]
Similar words from ' mujer ': [('honra', 0.9999152421951294), ('muerte', 0.9998959302902222), ('contento', 0.999891459941864)]
Similar words from ' hombre ': [('valor', 0.9999064207077026), ('nombre', 0.9998984336853027), ('llegar', 0.9998887181282043)]
As you can see, I get the same predictions despite having changed the embedding matrix by random numbers.
I don't get any method in the documentation to make this change.
Will it be possible?
I already found the solution. Just use the init_sims() function after setting the array.
w2v=Word2Vec(sentences,size=100,window=1,min_count=1,negative=15,iter=3)
w2v.save("word2vec.model")
#Getting embedding matrix
embedding_matrix=w2v.wv.vectors
for p in ("mujer", "hombre"):
result=w2v.wv.similar_by_word(p)
print("Similar words from '",p,"': ",result[:3])
#Setting new values on wights matrix
w2v.wv.vectors=np.random.rand(w2v.wv.vectors.shape[0],w2v.wv.vectors.shape[1])
#This line create a l2 normalization over the embedding matrix
word_vectors.vectors_norm=word_vectors.init_sims(replace=False)
print()
for p in ("mujer", "hombre"):
result=w2v.wv.similar_by_word(p)
print("Similar words from '",p,"': ",result[:3])