Considering the following examples:
Original Regex
A-B-C SCHOOL INSTITUTION --> ABC SCHOOL INSTITUTION
A B C SCHOOL INSTITUTION --> ABC SCHOOL INSTITUTION
The purpose is to set together single letters when they are separated by hyphens or spaces. I used the following pattern:
(?<!\w\w)(?:\s+|-)(?!\w\w)
However, I have the issue to not apply the same rule with numbers and because \w is including numbers the issue arise. For instance, the following is not allowed and should remain separated in the way it is:
Original Regex Desired
A 5 M SCHOOL CORPORATION A5M SCHOOL CORPORATION A 5 M SCHOOL CORPORATION
Thanks
First of all this (?:\s+|-)
could be shortened to [\s-]+
or [ -]+
. Second, you need a white list not a black list.
This means you don't look for (?!\w\w)
. Instead, you look for (?=\w\b)
or specifically (?=[a-zA-Z]\b)
in this case.
Finally, you don't want digits to be matched. So you need to exclude them before matching any [ -]
: (?<!\d)[ -]+
.
Putting it all together:
re.sub(r'(?<!\d)[ -]+(?=[a-zA-Z]\b)', '', str)
See live demo here