I have some data that I need to analyze. The data is multilined and each block is separated by a newline. So, it is something like this
Property 1: 1234
Property 2: 34546
Property 3: ACBGD
Property 1: 1234
Property 4: 4567
Property 1: just
Property 3: an
Property 5: simple
Property 6: example
I need to filter out those data blocks that have some particular Property present. For example, only those that have Property 4, only those that have Property 3 and 6 both etc. I might also need to choose based upon the value at these Properties, so for example only those blocks that have Property 3 and its value is 'an'.
How would I do this in Perl. I tried splitting it by "\n" but didn't seem to work properly. Am I missing something?
The secret to making this task simple is to use the $/ variable to put Perl into "paragraph mode". That makes it easy to process your records one at a time. You can then filter them with something like grep.
#!/usr/bin/perl
use strict;
use warnings;
my @data = do {
local $/ = '';
<DATA>;
};
my @with_4 = grep { /^Property 4:/m } @data;
my @with_3 = grep { /^Property 3:/m } @data;
my @with_3_6 = grep { /^Property 6:/m } @with_3;
print scalar @with_3_6;
__DATA__
Property 1: 1234
Property 2: 34546
Property 3: ACBGD
Property 1: 1234
Property 4: 4567
Property 1: just
Property 3: an
Property 5: simple
Property 6: example
In that example I'm processing each record as plain text. For more complex work, I'd probably turn each record into a hash.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my @data;
{
local $/ = '';
while (<DATA>) {
chomp;
my @rec = split /\n/;
my %prop;
foreach my $r (@rec) {
my ($k, $v) = split /:\s+/, $r;
$prop{$k} = $v;
}
push @data, \%prop;
}
}
my @with_4 = grep { exists $_->{'Property 4'} } @data;
my @with_3_6 = grep { exists $_->{'Property 3'} and
exists $_->{'Property 6'} } @data;
my @with_3an = grep { exists $_->{'Property 3'} and
$_->{'Property 3'} eq 'an' } @data;
print Dumper @with_3an;
__DATA__
Property 1: 1234
Property 2: 34546
Property 3: ACBGD
Property 1: 1234
Property 4: 4567
Property 1: just
Property 3: an
Property 5: simple
Property 6: example