Search code examples
mysqlwordnet

Princeton Wordnet database - two different synset identifiers?


I am trying to make sense of the different identifiers in the Princeton Wordnet database. I am using version 3.1. You can read about the structure here but my focus is on the synsets table.

The Synset Table The synsets table is one of the most important tables in the database. It is responsible for housing all the definitions within WordNet. Each row in the synset table has a synsetid, a definition, a pos (parts of speech field) and a lexdomainid (which links to the lexdomain table) There are 117373 synsets in the WordNet Database.

When I search for the word joy in my senses table, I see that there are four different results (2 nouns and 2 vebs). From there, I can identify the sense/meaning that I am looking for, which is the one that corresponds to the meaning:

"the emotion of great happiness"

So I have now found the result that I am looking for. The synset id of this result is 107542591 and I can search this id to find other words with the same sense/meaning.

Screenshot of synset id

However, when I use some online versions of Wordnet and I search for words in the synset "the emotion of great happiness", I see a different type of identifier. This identifier is 07527352-n.

For example, you can see it at the top-left corner of this site. On that same site, in the address bar you'll see that identifier is referred to as the synset id: &synset=07527352-n.

I would like to know how to retrieve the second type of identifier for a given synset. I've read through the documentation here and searched through the raw data files, but I cannot figure it out.

Thank you!


Solution

  • There are two things going on.

    First, MySQL does not like ids starting with a 0, so they start with 1. (Specifically, nouns get a 1 prefix, verbs 2, adjectives 3, and adverbs get a 4 prefix: see WordNet identifiers section at http://wordnet-rdf.princeton.edu/ )

    Second, 07542591 is from WordNet 3.1 (I've checked both the raw WordNet files, and the SQL files, and they both use this).

    "07527352" is from an older version of WordNet. In the case of Chinese WordNet I believe they use WordNet 3.0. http://compling.hss.ntu.edu.sg/cow/

    Additional: https://stackoverflow.com/a/33348009/841830 has more information. Strangely, I've not been able to track a simple 3.0 to 3.1 conversion table yet... but I'm sure I've seen one.