Search code examples

How to get domain of words using WordNet in Python?

How can I find domain of words using nltk Python module and WordNet?

Suppose I have words like (transaction, Demand Draft, cheque, passbook) and the domain for all these words is "BANK". How can we get this using nltk and WordNet in Python?

I am trying through hypernym and hyponym relationship:

For example:

from nltk.corpus import wordnet as wn
sports = wn.synset('sport.n.01')
[Synset('judo.n.01'), Synset('athletic_game.n.01'), Synset('spectator_sport.n.01'),    Synset('contact_sport.n.01'), Synset('cycling.n.01'), Synset('funambulism.n.01'), Synset('water_sport.n.01'), Synset('riding.n.01'), Synset('gymnastics.n.01'), Synset('sledding.n.01'), Synset('skating.n.01'), Synset('skiing.n.01'), Synset('outdoor_sport.n.01'), Synset('rowing.n.01'), Synset('track_and_field.n.01'), Synset('archery.n.01'), Synset('team_sport.n.01'), Synset('rock_climbing.n.01'), Synset('racing.n.01'), Synset('blood_sport.n.01')]


bark = wn.synset('bark.n.02')


  • There is no explicit domain information in the Princeton WordNet nor the NLTK's WN API.

    I would recommend you get a copy of the WordNet Domain resource and then link your synsets using the domains, see

    After you've registered and completed the download you will see a wn-domains-3.2-20070223 textfile, which is a tab-delimited file with first column the offset-PartofSpeech identifier and the 2nd column contains the domain tags separated by spaces, e.g.

    00584282-v  military pedagogy
    00584395-v  military school university
    00584526-v  animals pedagogy
    00584634-v  pedagogy
    00584743-v  school university
    00585097-v  school university
    00585271-v  pedagogy
    00585495-v  pedagogy
    00585683-v  psychological_features

    Then you use the following script to access synsets' domain(s):

    from collections import defaultdict
    from nltk.corpus import wordnet as wn
    # Loading the Wordnet domains.
    domain2synsets = defaultdict(list)
    synset2domains = defaultdict(list)
    for i in open('wn-domains-3.2-20070223', 'r'):
        ssid, doms = i.strip().split('\t')
        doms = doms.split()
        synset2domains[ssid] = doms
        for d in doms:
    # Gets domains given synset.
    for ss in wn.all_synsets():
        ssid = str(ss.offset).zfill(8) + "-" + ss.pos()
        if synset2domains[ssid]: # not all synsets are in WordNet Domain.
            print ss, ssid, synset2domains[ssid]
    # Gets synsets given domain.        
    for dom in sorted(domain2synsets):
        print dom, domain2synsets[dom][:3]

    Also look for the wn-affect that is very useful to disambiguate words for sentiment within the WordNet Domain resource.

    With updated NLTK v3.0, it comes with the Open Multilingual WordNet (, and since the French synsets share the same offset IDs, you can simply use the WND as a crosslingual resource. The french lemma names can be accessed as such:

    # Gets domains given synset.
    for ss in wn.all_synsets():
        ssid = str(ss.offset()).zfill(8) + "-" + ss.pos()
        if synset2domains[ssid]: # not all synsets are in WordNet Domain.
            print ss, ss.lemma_names('fre'), ssid, synset2domains[ssid]

    Note that the most recent version of NLTK changes synset properties to "get" functions: Synset.offset -> Synset.offset()