I'd like to replicate Figure 2 in Poincaré Embeddings for Learning Hierarchical Representations, namely: Poincare embeddings from the "mammal" subtree of WordNet.
First, I construct the transitive closure needed to represent the graph. Following these docs and this SO answer, I do the following to construct the relations:
from nltk.corpus import wordnet as wn
root = wn.synset('mammal.n.01')
words = list(set([w for s in root.closure(hyponyms) for w in s.lemma_names()]))
rname = root.name().split('.')[0]
closure = [(word, rname) for word in words]
Then I am using Gensim's Poincare
model to compute the embeddings. Given the example relations in Gensim's documentation, e.g.
relations = [('kangaroo', 'marsupial'), ('kangaroo', 'mammal'), ('gib', 'cat')]
I infer that the hypernym needs to be to the right. Here is the model fitting code:
from gensim.models.poincare import PoincareModel
from gensim.viz.poincare import poincare_2d_visualization
model = PoincareModel(relations, size=2, negative=0)
model.train(epochs=50)
fig = poincare_2d_visualization(model, relations, 'WordNet Poincare embeddings')
fig.show()
However, the result is obviously not correct in that it looks nothing like the paper. What am I doing wrong?
I think the main issue here stems from this line:
closure = [(word, rname) for word in words]
You are generating a list where every word is only connected to rname
which is "mammal". That is, you only get ("columbian_mammoth", "mammal")
and are missing the intermediate steps ("columbian_mammoth", "mammoth"), ("mammoth", "elephant"), ("elephant", "proboscidean")
and so on.
I suggest a recursive function append_pairs
to address this issue. I also fine-tuned the arguments to PoincareModel
and poincare_2d_visualization
a little bit.
from nltk.corpus import wordnet as wn
from gensim.models.poincare import PoincareModel
from gensim.viz.poincare import poincare_2d_visualization
def simple_name(r):
return r.name().split('.')[0]
def append_pairs(my_root, pairs):
for w in my_root.hyponyms():
pairs.append((simple_name(w), simple_name(my_root)))
append_pairs(w, pairs)
return pairs
if __name__ == '__main__':
root = wn.synset('mammal.n.01')
words = list(set([w for s in root.closure(lambda s: s.hyponyms()) for w in s.lemma_names()]))
relations = append_pairs(root, [])
model = PoincareModel(relations, size=2, negative=10)
model.train(epochs=20)
fig = poincare_2d_visualization(model, relations, 'WordNet Poincare embeddings', num_nodes=None)
fig.show()
The image is not yet as beautiful as in the original source, but at least you can see the clustering now.