I am working on a natural language processing project in Java. I have a requirement where I want to identify words that belong to similar semantic groups.
e.g. : if the words such as study
, university
, graduate
, attend
are found I want them to be categorized as being related to education.
If words such as golfer
, batsman
, athlete
are found, it should categorize all under a parent like sportsperson.
Is there a way I can achieve this task without using and training approach. Is there some toll like WordNet that can be used directly? Any pointer would be greatly appreciated!
Thanx cheers!! :-)
Yes, you can use WordNet. For example, you can search among hypernyms of the current word (e.g. study
) for your category word (e.g. education
or sport
). There are JAWS, JWNL, and other libraries, see related question.
Alternatively, you can compute similarity between candidate words and category words - e.g. by using ws4j or Semilar.