I have been asked to create three dimensional vector embeddings for a series of words. Although I understand what an embedding is and that word2vec
will be able to create the vector embeddings, I cannot find a resource that shows me how to create a three dimensional vector (all the resources show many more dimensions than this).
The format I have to create the file in is:
house 34444 0.3232 0.123213 1.231231
dog 14444 0.76762 0.76767 1.45454
which is in the format <token>\t<word_count>\t<vector_embedding_separated_by_spaces>
Can anyone point me towards a resource that will show me how to create the desired file format given some training text?
Once you've decided on a programming language, and word2vec library, its documentation will likely highlight a configurable parameter that lets you specify the dimensionality of the vectors it trains. So, you just need to change that parameter from its typical values , like 100
or 300
, to 3
.
(Note, though, that 3-dimensional word-vectors are unlikely to show the interesting & useful property of higher-dimensional vectors.)
Once you've used such a library to create the vectors-in-memory, writing them out in your specified format becomes just a file-IO problem, unrelated to word2vec itself. In typical languages, you'd open a new file for writing, loop over your data printing each line properly, then close the file.
(To get a more detailed answer from StackOverflow, you'd want to pick a specific language/library, show what you've already tried with actual code, and show how the results/errors achieved fall short of your goal.)