Search code examples
regexperl

how do i capture multiple "partners" of a string in perl regex?


I have a large multiline file to parse, which I have slurped into a single string in Perl. So it ends up like this:

my $string = "foo1 randomtext bar1 randomtext bar2 randomtext bar3/foo2 randomtext bar4 randomtext bar5 randomtext bar6 bar7/foo3 randomtext bar8 randomtext bar9/";

it consists of a set of records, each one with a header entry (foo+number) and each is separated by a symbol; "/" in this case.

I'm trying to capture the header info (foo) and some of the text further down in each entry (bar+number). in each case I would like to capture the header info paired with each instance of "bar" to maintain the specific foo and bar relationships within each entry.

I want the output to look like this:

foo1_bar1

foo1_bar2

foo1_bar3

foo2_bar4

foo2_bar5

foo2_bar6

foo2_bar7

foo3_bar8

foo3_bar9

I have tried various regex's, with combinations of ? after the .+ to make it minimal rather than maximal, including matching the \/ record separator after (bar\d) (which makes it only find the final bar of the record, rather than the first),

while ($string =~ m/(foo\d).+?(bar\d)+/g)
{
    print "$1_$2\n";
}

which returns

foo1_bar1

foo2_bar4

foo3_bar8

So just the first bar for each foo. Basically the + after the (bar\d) doesn't make this a multiple match and that's my problem.

Any thoughts?


Solution

  • my approach is to split at "/", get the "foo" and then use a simple regex to catch the bar's:

    use strict; 
    use warnings;
    
    my $string = "foo1 randomtext bar1 randomtext bar2 randomtext bar3/foo2 randomtext bar4 randomtext bar5 randomtext bar6 bar7/foo3 randomtext bar8 randomtext bar9/";
    
    foreach my $chunk (split(/\//,$string)) {
       (my $foo = $chunk) =~ s|.*(foo\d).*|$1|;
       while($chunk =~ m|(bar\d)|g) {
          print $foo . "_$1\n";
       }
    }