When I run the following code:
import re
s = 'baaaad'
l = re.findall(r'((a)(?=a))', s)
print l
for elem in l:
print ''.join(elem)
I get the output:
[('a', 'a'), ('a', 'a'), ('a', 'a')] aa aa aa
which is as expected. But when I try the corresponding strategy for lookbehind ie:
s = 'baaaad'
l = re.findall(r'((?<=b)(a))', s)
print l
for elem in l:
print ''.join(elem)
I get:
[('a', 'a')] aa
I was expecting to get:
[('b', 'a')] ba
Why this (to me) unexpected behavior? If I am doing something wrong, what is it? And how to fix it?
Thanks!
You seem to think that one of the groups in the output is from (a)
, and the other is from the lookahead or lookbehind. That's not the case. One of the groups is (a)
, and the other is from the parentheses surrounding your entire regex:
v v not these
((?<=b)(a))
^ ^ these
The lookahead does not match a
, and the lookbehind does not match b
. They match a position in the string after which an a
occurs, or before which a b
occurs. They don't match any actual characters. Thus, both your regexes only match a
, with restrictions on what might come before or after, and both capturing groups in both regexes only capture a
.