Search code examples
regexperlmultiple-matches

Getting multiple matches within a string using regex in Perl


After having read this similar question and having tried my code several times, I keep on getting the same undesired output.

Let's assume the string I'm searching is "I saw wilma yesterday". The regex should capture each word followed by an 'a' and its optional 5 following characters or spaces.

The code I wrote is the following:

$_ = "I saw wilma yesterday";

if (@m = /(\w+)a(.{5,})?/g){
    print "found " . @m . " matches\n";

    foreach(@m){
        print "\t\"$_\"\n";
    }
}

However, I kept on getting the following output:

found 2 matches
    "s"
    "w wilma yesterday"

while I expected to get the following one:

found 3 matches:
    "saw wil"
    "wilma yest"
    "yesterday"

until I found out that the return values inside @m were $1 and $2, as you can notice.

Now, since the /g flag is on, and I don't think the problem is about the regex, how could I get the desired output?


Solution

  • You can try this pattern that allows overlapped results:

    (?=\b(\w+a.{1,5}))
    

    or

    (?=(?i)\b([a-z]+a.{0,5}))
    

    example:

    use strict;
    my $str = "I saw wilma yesterday";
    my @matches = ($str =~ /(?=\b([a-z]+a.{0,5}))/gi);
    print join("\n", @matches),"\n";
    

    more explanations:

    You can't have overlapped results with a regex since when a character is "eaten" by the regex engine it can't be eaten a second time. The trick to avoid this constraint, is to use a lookahead (that is a tool that only checks, but not matches) which can run through the string several times, and put a capturing group inside.

    For another example of this behaviour, you can try the example code without the word boundary (\b) to see the result.