Search code examples
gensimword2vec

Infer "shapes", or infer analogous relations in Word2Vec


Gensim Word2Vec offers a system for inferring analogous relationships, that is, with the "same shape" as those already found?

Es: Starting from King, Queen

I would like to get other couples with male / female gender.

In other word: most_similar(positive=['king', X], negative=['queen']) -> Y

I would like to find as many xy pairs.


Solution

  • There's no built-in facility resembling what I think you're asking.

    But, you are of course free to cycle through any number of candidate words (as X, or the other arguments to most_similar()), to see what top-neighbors are reported (candidate Y values) - perhaps applying some threshold of similarity.

    Note the famous man:king :: woman: _?_ is usually presented to a word2vec model in Gensim as most_similar(positive=['king', 'woman'], negative=['man']), which sort of achieves king - man + woman = _?_. I'm not sure your alternate formulation, effectively king - queen + X = Y has an analogical meaning, for arbitrary X or responses Y.

    And, note that most_similar() suppresses the reporting of any candidate wards that are already arguments to positive or negative. Often, the results of the 'artihmetic' are still closer to the input words than anything else - but that won't be reported, showing next-best words instead.