deep-learning nlp pytorch data-science bert-language-model

word synonym / antonym detection

I need to create a classifier that takes 2 words and determines if they are synonyms or antonyms. I tried nltk's antsyn-net but it doesn't have enough data.

example:

capitalism <-[antonym]-> socialism
capitalism =[synonym]= free market
god <-[antonym]-> atheism
political correctness <-[antonym]-> free speach
advertising =[synonym]= marketing

I was thinking about taking a BERT model, because may be some of the relations would be embedded in it and transfer-learn on a data-set that I found.

Solution

I would suggest a following pipeline:

Construct a training set from existing dataset of synonyms and antonyms (taken e.g. from the Wordnet thesaurus). You'll need to craft negative examples carefully.
Take a pretrained model such as BERT and fine-tune it on your tasks. If you choose BERT, it should be probably BertForNextSentencePrediction where you use your words/prhases instead of sentences, and predict 1 if they are synonyms and 0 if they are not; same for antonyms.