import re
string = re.sub(r'-\n', '', string)
I want to tokenize words of a text. The problem is, that all words, which are at the end of a line, are tokenized wrong. So i have to remove the hyphen before a new line character.
Thanks for your help!
Try using a lookahead to identify the newline, rather than including it in part of the sub operation:
string = re.sub(r'-(?=\n)', '', string)