Search code examples
regexperlregex-group

Perl: How get multiple regex captures in a structured way?


I am trying to get all occurences of a group of patterns in an arbitrary string, much like this:

my $STRING = "I have a blue cat. That cat is nice, but also quite old. She is always bored.";

foreach (my @STOPS = $STRING =~ m/(?<FINAL_WORD>\w+)\.\s*(?<FIRST_WORD>\w+)/g ) {

  print Dumper \%+, \@STOPS;
}

But the outcome is not what I expected, and I don't fully understand why:

$VAR1 = {
          'FINAL_WORD' => 'old',
          'FIRST_WORD' => 'She'
        };
$VAR2 = [
          'cat',
          'That',
          'old',
          'She'
        ];
$VAR1 = {
          'FINAL_WORD' => 'old',
          'FIRST_WORD' => 'She'
        };
$VAR2 = [
          'cat',
          'That',
          'old',
          'She'
        ];
$VAR1 = {
          'FINAL_WORD' => 'old',
          'FIRST_WORD' => 'She'
        };
$VAR2 = [
          'cat',
          'That',
          'old',
          'She'
        ];
$VAR1 = {
          'FINAL_WORD' => 'old',
          'FIRST_WORD' => 'She'
        };
$VAR2 = [
          'cat',
          'That',
          'old',
          'She'
        ];

If there is no better solution I could live with what is in @STOPS in the end and omit the loop. But I would prefer to get every pair of matches separately, and I don't see a way.

But why then is the loop executed multiple times anyway?

Thank you in advance, and Regards,

Mazze


Solution

  • You need to use a while loop not a for loop:

    while ($STRING =~ m/(?<FINAL_WORD>\w+)\.\s*(?<FIRST_WORD>\w+)/g ) {
        print Dumper \%+;
    }
    

    Output:

    $VAR1 = {
              'FIRST_WORD' => 'That',
              'FINAL_WORD' => 'cat'
            };
    $VAR1 = {
              'FIRST_WORD' => 'She',
              'FINAL_WORD' => 'old'
            };
    

    The for loop gathers all the matches at once in @STOPS and %+ is set to the last global match. The while loop allows you to iterate through each global match separately.

    According to perldoc perlretut:

    The modifier /g stands for global matching and allows the matching operator to match within a string as many times as possible. In scalar context, successive invocations against a string will have /g jump from match to match, keeping track of position in the string as it goes along. You can get or set the position with the pos() function.