Search code examples
iphoneobjective-cregexregexkitlite

How can I write a regex that matches words that overlap themselves?


I'm trying to match a word forwards and backwards in a string but it isn't catching all matches. For example, searching for the word "AB" in the string "AAABAAABAAA", I create and use the regex /AB|BA/, but it only matches the two "AB" substrings, and ignores the "BA" substrings.

I'm using RegexKitLite on the iPhone, but I think this is a more general regex problem (I see the same behavior in online regex testers). Nevertheless, here's the code I'm using to enumerate the matches:

[@"AAABAAABAAA" enumerateStringsMatchedByRegex:@"AB|BA" usingBlock:
 ^(NSInteger captureCount,
   NSString * const capturedStrings[captureCount],
   const NSRange capturedRanges[captureCount],
   volatile BOOL * const stop) { 
     NSLog(@"%@", capturedStrings[0]);
 }];

Output:

AB
AB

Solution

  • I don't know which online tester you tried, but http://www.regextester.com/ (for example) will not consider the same character for multiple matches. In this case, since ABA matches AB, the B is not considered for the BA match. It's purely a guess that RegexKitLite is implemented similarly.

    Even if you don't consider the mirrored variant, the original search string may overlap with itself. For example, if you search ABCA|ACBA in ABCABCACBACBA you'll get two of four matches, searching in both directions will be the same.

    It should be possible to find matches incrementally, but perhaps not with RegexKitLite