Search code examples
perlsplitmultilinetext-parsingdata-analysis

Parsing multiline data in Perl


I have some data that I need to analyze. The data is multilined and each block is separated by a newline. So, it is something like this

Property 1: 1234
Property 2: 34546
Property 3: ACBGD

Property 1: 1234
Property 4: 4567

Property 1: just
Property 3: an
Property 5: simple
Property 6: example

I need to filter out those data blocks that have some particular Property present. For example, only those that have Property 4, only those that have Property 3 and 6 both etc. I might also need to choose based upon the value at these Properties, so for example only those blocks that have Property 3 and its value is 'an'.

How would I do this in Perl. I tried splitting it by "\n" but didn't seem to work properly. Am I missing something?


Solution

  • The secret to making this task simple is to use the $/ variable to put Perl into "paragraph mode". That makes it easy to process your records one at a time. You can then filter them with something like grep.

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my @data = do {
      local $/ = '';
      <DATA>;
    };
    
    my @with_4   = grep { /^Property 4:/m } @data;
    
    my @with_3   = grep { /^Property 3:/m } @data;
    my @with_3_6 = grep { /^Property 6:/m } @with_3;
    
    print scalar @with_3_6;
    
    __DATA__
    Property 1: 1234
    Property 2: 34546
    Property 3: ACBGD
    
    Property 1: 1234
    Property 4: 4567
    
    Property 1: just
    Property 3: an
    Property 5: simple
    Property 6: example
    

    In that example I'm processing each record as plain text. For more complex work, I'd probably turn each record into a hash.

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    use Data::Dumper;
    
    my @data;
    
    {
      local $/ = '';
    
      while (<DATA>) {
        chomp;
    
        my @rec = split /\n/;
        my %prop;
        foreach my $r (@rec) {
          my ($k, $v) = split /:\s+/, $r;
          $prop{$k} = $v;
        }
    
        push @data, \%prop;
      }
    }
    
    my @with_4   = grep { exists $_->{'Property 4'} } @data;
    
    my @with_3_6 = grep { exists $_->{'Property 3'} and
                          exists $_->{'Property 6'} } @data;
    
    my @with_3an = grep { exists $_->{'Property 3'} and
                          $_->{'Property 3'} eq 'an' } @data;
    
    print Dumper @with_3an;
    
    __DATA__
    Property 1: 1234
    Property 2: 34546
    Property 3: ACBGD
    
    Property 1: 1234
    Property 4: 4567
    
    Property 1: just
    Property 3: an
    Property 5: simple
    Property 6: example