Search code examples
pythonregexregex-lookarounds

How to use negative lookahead to remove search between two chars


string1 = '%(example_1).40s-a%(example-2)s_-%(example3)s_s1'

output

'-a', '_-', '_s1'

Need to remove all selection between '%' and 's'

Attempt 1:

re.findall("[-_a-z0-9]+(?![^%]*\s)", string1)

result:

['example_1', '0s-a', 'example-', 's_-', 'example', 's_s1']

Attempt 2:

re.findall("[-_a-z0-9]+(?![^(]*\))", string1)

result:

['40s-a', 's_-', 's_s1']

attempt 2 is sorta close expect it matched '40s' which is between % & s. and overmatched 's' in the other entries.

expected output

['-a', '_-', '_s1']

EDIT:

Want to confirm how to not search between % & s.

string2 = 'abc123%(example_1).40s-a%(example-2)s_-%(example3)s_s1'

expected output: ['abc123', '-a', '_-', '_s1'

string3 = 'abc123%(example_1).40s-a%(example-2)s_-%(examples3).40s'

expected output: ['abc123', '-a', '_-']


Solution

  • I would rather use the "negative" approach, with re.split using non-greedy match to match chars between % and s: the regex is then very simple

    Only kludge: you need to filter empty fields (start of the string)

    import re
    
    result = [x for x in re.split("%.*?s",'%(example_1).40s-a%(example-2)s_-%(example3)s_s1') if x]
    
    print(result)
    

    result:

    ['-a', '_-', '_s1']
    

    edit: that simple expression doesn't work if parentheses contain "s" character, you can then replace the expression by a more complex one:

    %\(.*?\).*?s|%.*?s
    

    (which is an expression requiring parentheses OR the previous simple expression: allows to match even if no parentheses)