I am observing that coreNLP 3.9.2 has started splitting enti_ties into multiple ones like 'enti' , '_', 'ties' while tokenizing
I have tried to use the tokenize.whitespace which solves this problem. But I think this will stop splitting tokens for "cant't" and "dont't"
One thing you can do is replace the underscores (_) with a period (.) and the parser (and tokenizer, I believe) will interpret it as one entity.
E.g. enti_ties
> enti.ties
where the latter is retained as one entity
This doesn't entirely resolve the problem, but serves as a workaround in a pinch.