Search code examples
python-3.xstringpunctuation

Detect missing space after punctuation and space before punctuation


I want to detect a missing space following a punctuation and an extra space before punctuation. I attempted to use a regex to match [A-Za-z0-9][?.,:!][A-Z] and [A-Za-z0-9]\s+[?.,:!], but both of these return None when applied to the string Something is in the air tonight.Or is it ?.

import re

mystring = "Something is in the air tonight.Or is it ?"

missing_space_regex = re.compile('[A-Za-z0-9][?.,:!][A-Z]')
print(missing_space_regex.match(mystring))

extra_space_regex = re.compile('[A-Za-z0-9]\s+[?.,:!]')
print(extra_space_regex.match(mystring))

I realize that the extra_space_regex as above will not detect the case where the text begins with a punctuation, but I can handle that as a special case.


Solution

  • If you can use regex instead of re, you could take advantage of regex Unicode character classes, like \p{P} for a punctuation character:

    import regex
    
    mystring = "Something is in the air tonight.Or is it ?"
    
    missing_space_regex = regex.compile(r'.*?\p{P}\S')
    print(missing_space_regex.match(mystring))
    
    extra_space_regex = regex.compile(r'.*?\s\p{P}')
    print(extra_space_regex.match(mystring))
    

    Outputs:

    <regex.Match object; span=(0, 33), match='Something is in the air tonight.O'>
    <regex.Match object; span=(0, 42), match='Something is in the air tonight.Or is it ?'>
    

    Or if you do want to use your chosen punctuation characters and re:

    punc = "?.,:!"
    
    missing_space_re = re.compile(f".*?[{punc}]\S")
    print(missing_space_re.match(mystring))
    
    extra_space_re = re.compile(f'.*?\s[{punc}]')
    print(extra_space_re.match(mystring))