Search code examples
pythonnlpwordnet

Is there any duplicates in synset.hyperym_distances()?


I was just looking around the function (hypernym_distances()) and when I saw the result of all the possible hypernyms there were two "entity.n.01" elements with different distances,what is the reason for it ? could anyone please explain about this?

In Python:

print([{i[0] : i[1]} for i in wn.synset('person.n.01').hypernym_distances()])

This above code will display all hypernyms that lead to final term 'entity' is displayed.

The output is :

[{Synset('entity.n.01'): 3}, {Synset('object.n.01'): 4}, {Synset('physical_entity.n.01'): 5}, {Synset('organism.n.01'): 1}, {Synset('person.n.01'): 0}, {Synset('entity.n.01'): 6}, {Synset('living_thing.n.01'): 2}, {Synset('physical_entity.n.01'): 2}, {Synset('causal_agent.n.01'): 1}, {Synset('whole.n.02'): 3}]

This is a name-value pair containing the synset which is one of the hypernym of the the specified word.

Could anyone explain what is the reason for the appearing of the 'entity.n.01' two times in the output.

{Synset('entity.n.01'): 6}

{Synset('entity.n.01'): 3}


Solution

  • When code is obtuse, break them down.

    Also, try not to use one-liner, they usually have no computational speed ups other than just how fast you can type them.

    Know what you're iterating through

    So let's break the down.

    In the complicated print of a list comprehension that creates a dictionary for every element the list iteration, we see:

    print([{i[0] : i[1]} for i in wn.synset('person.n.01').hypernym_distances()])
    

    It looks like the loop itself can be simplified. First set a variable to keep the synset. (I assume that you would want the same operation on several synsets instead of only using it only on person.n.01):

    person = wn.synset('person.n.01')
    

    Now let's see what person.hypernym_distances() returns:

    >>> person.hypernym_distances()
    {(Synset('person.n.01'), 0), (Synset('organism.n.01'), 1), (Synset('whole.n.02'), 3), (Synset('physical_entity.n.01'), 5), (Synset('causal_agent.n.01'), 1), (Synset('entity.n.01'), 3), (Synset('living_thing.n.01'), 2), (Synset('physical_entity.n.01'), 2), (Synset('entity.n.01'), 6), (Synset('object.n.01'), 4)}
    

    The data structure of person.hypernym_distances() is already a set of tuple where the first element is the hypernym and the second element is the distance. And Synset('entity.n.01') should only occur once in person.hypernym_distances() since it's set type.

    Unpacking iterable of tuples/iterables in a loop

    When iterating through a tuple, you can easily "unpack" it (see Unpacking a list / tuple of pairs into two lists / tuples and How can I iterate through two lists in parallel?)

     >>> from nltk.corpus import wordnet as wn
    >>> wn.synset('person.n.01')
    Synset('person.n.01')
    >>> person = wn.synset('person.n.01')
    >>> person.hypernym_distances()
    {(Synset('person.n.01'), 0), (Synset('organism.n.01'), 1), (Synset('whole.n.02'), 3), (Synset('physical_entity.n.01'), 5), (Synset('causal_agent.n.01'), 1), (Synset('entity.n.01'), 3), (Synset('living_thing.n.01'), 2), (Synset('physical_entity.n.01'), 2), (Synset('entity.n.01'), 6), (Synset('object.n.01'), 4)}
    >>> for ss, count in person.hypernym_distances():
    ...     print (ss,'\t', count)
    ... 
    Synset('person.n.01')    0
    Synset('organism.n.01')      1
    Synset('whole.n.02')     3
    Synset('physical_entity.n.01')   5
    Synset('causal_agent.n.01')      1
    Synset('entity.n.01')    3
    Synset('living_thing.n.01')      2
    Synset('physical_entity.n.01')   2
    Synset('entity.n.01')    6
    Synset('object.n.01')    4
    

    By iterating through the list of tuples in the above way you avoid the ugly (i[0], i[1]) for i in iterable_of_tuples syntax. Instead do (a,b) for a,b in iterable_of_tuples.

    List vs Dict comprehension

    It seems like you are trying to put the tuples from person.hypernym_distances() into a dictionary where the key is the synset and the value is the count.

    I guess the mistake comes when you're tripped between list comprehension and dictionary comprehension. There isn't a need to create a new dictionary for every element in person.hypernym_distances(). Instead, I think the dictionary comprehension is what you're looking for, i.e.:

    >>> {ss:count for ss, count in person.hypernym_distances()}
    {Synset('object.n.01'): 4, Synset('whole.n.02'): 3, Synset('living_thing.n.01'): 2, Synset('organism.n.01'): 1, Synset('entity.n.01'): 6, Synset('person.n.01'): 0, Synset('causal_agent.n.01'): 1, Synset('physical_entity.n.01'): 2}
    

    Casting list of tuples into dictionary

    Actually, if the dictionary above is what you want given an iterable of tuples with 2 items per tuple, casting the iterable into a dictionary would automatically set the first item in the tuple as the key and the second as the value:

    >>> dict(person.hypernym_distances())
    {Synset('object.n.01'): 4, Synset('whole.n.02'): 3, Synset('living_thing.n.01'): 2, Synset('organism.n.01'): 1, Synset('entity.n.01'): 6, Synset('person.n.01'): 0, Synset('causal_agent.n.01'): 1, Synset('physical_entity.n.01'): 2}
    

    See also

    There're high performance containers data structures in native python that can handle these, they come with nifty functions too. See https://docs.python.org/3/library/collections.html