regex to prepend NOT between word and punctuation

I an trying to reproduce using regex the classical tokenization trick to deal with sentences like

"I didn't like that SO question, but I like pizza!"

The solution that has been proposed in the literature is actually very simple. Prepend with NOT_ every token between "didnt' and the next punctuation mark. So in our example this becomes:

"I didn't NOT_like NOT_that NOT_SO NOT_question, but I like pizza!"

How can we do that using python or regex?

Thanks!

Solution

Tokenize using regexes, then split and join like so:

import re
sentence = "I didn't like that SO question, but I like pizza!"
words = re.split("([,.?:!;]|didn't)", sentence)
not_sentence = "".join([word if (idx == 0 or words[idx-1] != "didn't")
                        else re.sub(r"(\w+)", "NOT_\\1", word)
                        for idx, word in enumerate(words)])
print(not_sentence)
# I didn't NOT_like NOT_that NOT_SO NOT_question, but I like pizza!