I have a large multiline file to parse, which I have slurped into a single string in Perl. So it ends up like this:
my $string = "foo1 randomtext bar1 randomtext bar2 randomtext bar3/foo2 randomtext bar4 randomtext bar5 randomtext bar6 bar7/foo3 randomtext bar8 randomtext bar9/";
it consists of a set of records, each one with a header entry (foo+number)
and each is separated by a symbol; "/"
in this case.
I'm trying to capture the header info (foo) and some of the text further down in each entry (bar+number). in each case I would like to capture the header info paired with each instance of "bar" to maintain the specific foo and bar relationships within each entry.
I want the output to look like this:
foo1_bar1
foo1_bar2
foo1_bar3
foo2_bar4
foo2_bar5
foo2_bar6
foo2_bar7
foo3_bar8
foo3_bar9
I have tried various regex's, with combinations of ?
after the .
+ to make it minimal rather than maximal, including matching the \/
record separator after (bar\d) (which makes it only find the final bar of the record, rather than the first),
while ($string =~ m/(foo\d).+?(bar\d)+/g)
{
print "$1_$2\n";
}
which returns
foo1_bar1
foo2_bar4
foo3_bar8
So just the first bar for each foo. Basically the +
after the (bar\d)
doesn't make this a multiple match and that's my problem.
Any thoughts?
my approach is to split at "/", get the "foo" and then use a simple regex to catch the bar's:
use strict;
use warnings;
my $string = "foo1 randomtext bar1 randomtext bar2 randomtext bar3/foo2 randomtext bar4 randomtext bar5 randomtext bar6 bar7/foo3 randomtext bar8 randomtext bar9/";
foreach my $chunk (split(/\//,$string)) {
(my $foo = $chunk) =~ s|.*(foo\d).*|$1|;
while($chunk =~ m|(bar\d)|g) {
print $foo . "_$1\n";
}
}