Search code examples
deep-learningnlppytorchdata-sciencebert-language-model

word synonym / antonym detection


I need to create a classifier that takes 2 words and determines if they are synonyms or antonyms. I tried nltk's antsyn-net but it doesn't have enough data.

example:

  • capitalism <-[antonym]-> socialism
  • capitalism =[synonym]= free market
  • god <-[antonym]-> atheism
  • political correctness <-[antonym]-> free speach
  • advertising =[synonym]= marketing

I was thinking about taking a BERT model, because may be some of the relations would be embedded in it and transfer-learn on a data-set that I found.


Solution

  • I would suggest a following pipeline:

    1. Construct a training set from existing dataset of synonyms and antonyms (taken e.g. from the Wordnet thesaurus). You'll need to craft negative examples carefully.
    2. Take a pretrained model such as BERT and fine-tune it on your tasks. If you choose BERT, it should be probably BertForNextSentencePrediction where you use your words/prhases instead of sentences, and predict 1 if they are synonyms and 0 if they are not; same for antonyms.