Poincare embeddings: building transitive closures from WordNet

I'd like to replicate Figure 2 in Poincaré Embeddings for Learning Hierarchical Representations, namely: Poincare embeddings from the "mammal" subtree of WordNet.

First, I construct the transitive closure needed to represent the graph. Following these docs and this SO answer, I do the following to construct the relations:

from   nltk.corpus import wordnet as wn

root    = wn.synset('mammal.n.01')
words   = list(set([w for s in root.closure(hyponyms) for w in s.lemma_names()]))
rname   = root.name().split('.')[0]
closure = [(word, rname) for word in words]

Then I am using Gensim's Poincare model to compute the embeddings. Given the example relations in Gensim's documentation, e.g.

relations = [('kangaroo', 'marsupial'), ('kangaroo', 'mammal'), ('gib', 'cat')]

I infer that the hypernym needs to be to the right. Here is the model fitting code:


from   gensim.models.poincare import PoincareModel
from   gensim.viz.poincare import poincare_2d_visualization

model = PoincareModel(relations, size=2, negative=0)
model.train(epochs=50)

fig = poincare_2d_visualization(model, relations, 'WordNet Poincare embeddings')
fig.show()

However, the result is obviously not correct in that it looks nothing like the paper. What am I doing wrong?

Solution

I think the main issue here stems from this line:

closure = [(word, rname) for word in words]

You are generating a list where every word is only connected to rname which is "mammal". That is, you only get ("columbian_mammoth", "mammal") and are missing the intermediate steps ("columbian_mammoth", "mammoth"), ("mammoth", "elephant"), ("elephant", "proboscidean") and so on.

I suggest a recursive function append_pairs to address this issue. I also fine-tuned the arguments to PoincareModel and poincare_2d_visualization a little bit.

from nltk.corpus import wordnet as wn
from gensim.models.poincare import PoincareModel
from gensim.viz.poincare import poincare_2d_visualization


def simple_name(r):
    return r.name().split('.')[0]


def append_pairs(my_root, pairs):
    for w in my_root.hyponyms():
        pairs.append((simple_name(w), simple_name(my_root)))
        append_pairs(w, pairs)
    return pairs


if __name__ == '__main__':
    root = wn.synset('mammal.n.01')
    words = list(set([w for s in root.closure(lambda s: s.hyponyms()) for w in s.lemma_names()]))

    relations = append_pairs(root, [])

    model = PoincareModel(relations, size=2, negative=10)
    model.train(epochs=20)

    fig = poincare_2d_visualization(model, relations, 'WordNet Poincare embeddings', num_nodes=None)
    fig.show()

The image is not yet as beautiful as in the original source, but at least you can see the clustering now.