I have this code for removing all punctuation from a regex string:
import regex as re
re.sub(ur"\p{P}+", "", txt)
How would I change it to allow hyphens? If you could explain how you did it, that would be great. I understand that here, correct me if I'm wrong, P with anything after it is punctuation.
[^\P{P}-]+
\P
is the complementary of \p
- not punctuation. So this matches anything that is not (not punctuation or a dash) - resulting in all punctuation except dashes.
Example: http://www.rubular.com/r/JsdNM3nFJ3
If you want a non-convoluted way, an alternative is \p{P}(?<!-)
: match all punctuation, and then check it wasn't a dash (using negative lookbehind).
Working example: http://www.rubular.com/r/5G62iSYTdk