Search code examples
nlpword2vecword-embedding

How to get three dimensional vector embedding for a list of words


I have been asked to create three dimensional vector embeddings for a series of words. Although I understand what an embedding is and that word2vec will be able to create the vector embeddings, I cannot find a resource that shows me how to create a three dimensional vector (all the resources show many more dimensions than this).

The format I have to create the file in is:

house    34444     0.3232 0.123213 1.231231
dog    14444    0.76762 0.76767 1.45454

which is in the format <token>\t<word_count>\t<vector_embedding_separated_by_spaces>

Can anyone point me towards a resource that will show me how to create the desired file format given some training text?


Solution

  • Once you've decided on a programming language, and word2vec library, its documentation will likely highlight a configurable parameter that lets you specify the dimensionality of the vectors it trains. So, you just need to change that parameter from its typical values , like 100 or 300, to 3.

    (Note, though, that 3-dimensional word-vectors are unlikely to show the interesting & useful property of higher-dimensional vectors.)

    Once you've used such a library to create the vectors-in-memory, writing them out in your specified format becomes just a file-IO problem, unrelated to word2vec itself. In typical languages, you'd open a new file for writing, loop over your data printing each line properly, then close the file.

    (To get a more detailed answer from StackOverflow, you'd want to pick a specific language/library, show what you've already tried with actual code, and show how the results/errors achieved fall short of your goal.)