Similar to the question linked below, I would like to access the input and output matricies WI and WO. However, I am using the Blazingtext implementation of Word2Vec.
After fitting the model.tar.gz atrifact contains: vectors.txt which corresponds to WI; and the binary model.bin for hosting
Does anyone know if it's posisble to acess the WO matrix if using blazingtext?
How can I access output embedding(output vector) in gensim word2vec?
Thanks in advance
From a quick glance through the BlazingText docs & examples notebooks, I don't see Amazon choosing to expose any access to the 'output' vector weights, nor any examples which indirectly reveal where they might be.
They might be encoded somewhere in the vectors.bin
, but there seem to be no docs of that format. If the source code for BlazingText (or even a few key parts were available, it might be straightforward to figure out if (& where) those get stored... but it seems no source code is available.
So it may be the case that only Amazon engineers with proprietary info can answer that question.
Are you sure you need to use Amazon's BlazingText rather than some better-documented, source-available alternate implementation?
Its main benefit seems to be training speed, which may only be decisive for extra-large training sets, or situations where short-lag reindexing occurs regularly. (In cases where some large historic corpus is used for training once, then the vectors used for many downstream purposes, a one-time training session that runs "over lunch" or "overnight" is often fast enough.)