Search code examples
pythonpython-3.xregexstring-matchingpython-regex

Regular expression for finding a sub-string


I am trying to find all occurances of a sub-string using regular expression. The sub-string is composed of three parts, starts with one or more 'A', followed by one or more 'N' and ended with one or more 'A'. Let a string 'AAANAANABNA' and if I parse the string I should get two sub-strings 'AAANAA' and 'AANA' as the output. So, I have tried the below code.

import regex as re
reg_a='A+N+A+'
s='AAANAANABNA'
sub_str=re.findall(reg_a,s,overlapped=True)
print(sub_str)

And, I am getting the below output,

['AAANAA', 'AANAA', 'ANAA', 'AANA', 'ANA']

But, I want the output as,

['AAANAA', 'AANA']

That is, the trailing A's of the first match should be the leading A's of the next match. How can I get that, any idea?


Solution

  • Make sure there are no A on the left:

    >>> reg_a='(?<!A)A+N+A+'
    >>> print( re.findall(reg_a,s,overlapped=True) )
    ['AAANAA', 'AANA']
    

    The (?<!A)A+N+A+ matches

    • (?<!A) - a negative lookbehind that matches a location that is not immediately preceded with A
    • A+ - one or more As
    • N+ - one or more Ns
    • A+ - one or more As

    Note you may use re to get the matches, too:

    >>> import re
    >>> re_a = r'(?=(?<!A)(A+N+A+))'
    >>> print( re.findall(re_a, s) )
    ['AAANAA', 'AANA']