Search code examples
allennlp

Giving pretokenized input to sentiment classifier


I am using the sentiment classifier in python according to this demo.

Is it possible to give pre-tokenized text as input to the predictor? I would like to be able to use my own custom tokenizer.


Solution

  • There are two AllenNLP sentiment analysis models, and they are both tightly tied to their tokenizations. The GLoVe-based one needs tokens that correspond to the pre-trained GLoVe embeddings, and similarly the RoBERTa one needs tokens (word pieces) that correspond with its pretraining. It does not really make sense to use these models with a different tokenizer.