Search code examples
tensorflowgensimword2vec

How to get all the weight updates from Word2Vec


I am not only interested in the final W0 and W1 (also, to some known as W and W'), but all the variations of these two matrices during the learning.

For now, I am using the gensim implementation, but compared to sklearn, gensim's API is not very well organized in my mind. Hence, I am open to moving to tf if need be, given that getting access to these values would be possible/easier.

I know I can hack the main code; my question is whether there already is a function/variable for it.


Solution

  • There's no specific API for seeing individual training example updates, or interim weights mid-training.

    But as you've intuited, instead of calling train() once, letting it run all epochs and all learning-rate-updates (as is recommended), you could call it one epoch at a time, providing it the right incremental start_alpha and end_alpha yourself each call, and between the calls look at the word-vectors (aka "projection weights") and hidden-to-output weights (syn1neg for default negative-sampling, or syn1 for hierarchical-softmax).

    If you needed more fine-grained reporting, you'd need to modify the source code to add the extra logging/callouts/etc you need.