perl split multiline text-parsing data-analysis

Parsing multiline data in Perl

I have some data that I need to analyze. The data is multilined and each block is separated by a newline. So, it is something like this

Property 1: 1234
Property 2: 34546
Property 3: ACBGD

Property 1: 1234
Property 4: 4567

Property 1: just
Property 3: an
Property 5: simple
Property 6: example

I need to filter out those data blocks that have some particular Property present. For example, only those that have Property 4, only those that have Property 3 and 6 both etc. I might also need to choose based upon the value at these Properties, so for example only those blocks that have Property 3 and its value is 'an'.

How would I do this in Perl. I tried splitting it by "\n" but didn't seem to work properly. Am I missing something?

Solution

The secret to making this task simple is to use the $/ variable to put Perl into "paragraph mode". That makes it easy to process your records one at a time. You can then filter them with something like grep.

#!/usr/bin/perl

use strict;
use warnings;

my @data = do {
  local $/ = '';
  <DATA>;
};

my @with_4   = grep { /^Property 4:/m } @data;

my @with_3   = grep { /^Property 3:/m } @data;
my @with_3_6 = grep { /^Property 6:/m } @with_3;

print scalar @with_3_6;

__DATA__
Property 1: 1234
Property 2: 34546
Property 3: ACBGD

Property 1: 1234
Property 4: 4567

Property 1: just
Property 3: an
Property 5: simple
Property 6: example

In that example I'm processing each record as plain text. For more complex work, I'd probably turn each record into a hash.

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

my @data;

{
  local $/ = '';

  while (<DATA>) {
    chomp;

    my @rec = split /\n/;
    my %prop;
    foreach my $r (@rec) {
      my ($k, $v) = split /:\s+/, $r;
      $prop{$k} = $v;
    }

    push @data, \%prop;
  }
}

my @with_4   = grep { exists $_->{'Property 4'} } @data;

my @with_3_6 = grep { exists $_->{'Property 3'} and
                      exists $_->{'Property 6'} } @data;

my @with_3an = grep { exists $_->{'Property 3'} and
                      $_->{'Property 3'} eq 'an' } @data;

print Dumper @with_3an;

__DATA__
Property 1: 1234
Property 2: 34546
Property 3: ACBGD

Property 1: 1234
Property 4: 4567

Property 1: just
Property 3: an
Property 5: simple
Property 6: example