I want to detect a missing space following a punctuation and an extra space before punctuation. I attempted to use a regex to match [A-Za-z0-9][?.,:!][A-Z]
and [A-Za-z0-9]\s+[?.,:!]
, but both of these return None
when applied to the string Something is in the air tonight.Or is it ?
.
import re
mystring = "Something is in the air tonight.Or is it ?"
missing_space_regex = re.compile('[A-Za-z0-9][?.,:!][A-Z]')
print(missing_space_regex.match(mystring))
extra_space_regex = re.compile('[A-Za-z0-9]\s+[?.,:!]')
print(extra_space_regex.match(mystring))
I realize that the extra_space_regex
as above will not detect the case where the text begins with a punctuation, but I can handle that as a special case.
If you can use regex
instead of re
, you could take advantage of regex Unicode character classes, like \p{P}
for a punctuation character:
import regex
mystring = "Something is in the air tonight.Or is it ?"
missing_space_regex = regex.compile(r'.*?\p{P}\S')
print(missing_space_regex.match(mystring))
extra_space_regex = regex.compile(r'.*?\s\p{P}')
print(extra_space_regex.match(mystring))
Outputs:
<regex.Match object; span=(0, 33), match='Something is in the air tonight.O'>
<regex.Match object; span=(0, 42), match='Something is in the air tonight.Or is it ?'>
Or if you do want to use your chosen punctuation characters and re
:
punc = "?.,:!"
missing_space_re = re.compile(f".*?[{punc}]\S")
print(missing_space_re.match(mystring))
extra_space_re = re.compile(f'.*?\s[{punc}]')
print(extra_space_re.match(mystring))