Search code examples
pythonregexregex-lookarounds

Using regex in python to remove double quotes with exclusions


I'm trying to remove specific double quotes from text using regular expression in python. I would like to leave only those double quotes which indicate an inch. So this would mean leave any double quote following a number.

txt = 'measurement 1/2" and 3" "remove" end" a " multiple"""

Expected output: measurement 1/2" and 3" remove end a multiple

This is the closest I've got.

re.sub(r'[^(?!\d+/\d+")]"+', '', txt)


Solution

  • Simply use

    (?<!\d)"+
    

    See a demo on regex101.com.


    Your original expression

    [^(?!\d+/\d+")]
    

    basically meant not (, ?, !, etc.


    Alternatively, you could use the newer regex module with (*SKIP)(*FAIL):

    import regex as re
    
    junk = '''measurement 1/2" and 3" "remove" end" a " multiple"""
    ABC2DEF3"'''
    
    rx = re.compile(r'\b\d(?:/\d+)?"(*SKIP)(*FAIL)|"+')
    
    cleaned = rx.sub('', junk)
    print(cleaned)
    

    Which would yield

    measurement 1/2" and 3" remove end a  multiple
    ABC2DEF3