Search code examples
regexperlregex-groupmultiline

Perl: iterate through multiline match


I would like iterate through a multiline pattern in Perl, but I'm struggling with the syntax.

My input string is:


+++    STAR-WARS        2020-01-01 00:00:00+00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

My desired resulting hash would be:

$VAR1 = {
          'In-universe information' => {
                                         'Gender' => 'Male',
                                         'Species' => 'Human',
                                         'results' => '1',
                                         'television series of num' => 'whatever'
                                       },
          'Other attribute' => {
                                 'Affiliation' => [
                                                    'Jedi Order',
                                                    'Galactic Republic',
                                                    'Rebel Alliance'
                                                  ],
                                 'Occupation' => 'Jedi',
                                 'Significant other' => 'Satine Kryze',
                                 'results' => '1'
                               },
          'Personal Details' => {
                                  'Alias' => [
                                               'Padawan',
                                               'Jedi Knight',
                                               'Jedi General',
                                               'Jedi Master'
                                             ],
                                  'First Name' => 'Obi-Wan',
                                  'Last Name' => 'Kenobi',
                                  'Points to other set of information' => 'whatever',
                                  'results' => '1'
                                },
          'code' => '0',
          'description' => 'Operation success'
        };

What I have come up with works well for a "single block" (e.g. Personal Details above). However, if the data contains multiple blocks, I can't figure out how to iterate through every matching block. (e.g. use while with /g)

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

local $/;
my $output = <DATA>;

my %hash;
($hash{'code'}, $hash{'description'}) = $output =~ /^RETCODE = (\d+)\s+(.*)\n/m;

if ($hash{'code'} eq "0") {
    my ($type,$data, $results) = $output =~ /([^\n]+)\n-+\n(.*)\n\n\(Number of results = (\d+)\)\n\n/sm;
    my $previousKey = "";
    while ($data =~ /(.+)$/mg) {
    my $line = $1;
    $line =~ s/(?:^ +)//g;
    my ($key, $value);
    if ($line =~ /^\s*=  /) {
        ($value) = $line =~ /^\s*=  (.*)$/;
        $hash{$type}{$previousKey} = [ $hash{$type}{$previousKey} ] unless ref($hash{$type}{$previousKey});
        push (@{$hash{$type}{$previousKey}}, $value);
    } else {
        ($key, $value) = split(/  =  /, $line);
        $hash{$type}{$key} = $value;
        $previousKey = $key;
    }
    }
    say STDERR Dumper(\%hash);
}

__DATA__

+++    STAR-WARS        2020-01-01 00:00:00+00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

Few facts:

  • every "block" always contains a header, followed by newline and dashes equal to the length of the header.
  • every "block" always ends with \n, followed by (Number of results = \d+), followed by \n.
  • each key/value pair always have two spaces before and after the equal sign. i.e. / = /
  • when no key exists, assume it's an [array], and append the value to the previous key. e.g. Alias in my example above.
  • the string will always ends with --- END followed by a \n

Solution

  • According your description the section is starting with +++ ... and ending with --- END.

    Based on this information the input can be devided with regex into blocks of interest which then processed individually in a loop with a parser to build a hash.

    NOTE: the parser was slightly modified and put into subroutine

    use strict;
    use warnings;
    use feature 'say';
    
    use Data::Dumper;
    
    my @shows;
    
    my $data   = do { local $/; <DATA> };
    my @blocks = $data =~ /^(\+\+\+ .*?^---    END)/msg;
    
    push @shows, parse($_) for @blocks;
    
    say Dumper(\@shows);
    
    exit 0;
    
    sub parse {
        my $data = shift;
    
        my(@sections,$re,$r);
        
    #   Alternative block to extract show info section
    #   $re = qr/^\+\+\+\s+(\S+)\s+(\S+)\s+(\S+)\s+\S+\s+(\S+)\s+%%[^:]+?:\s+([^;]+?);%%\s+RETCODE = (\d+)\s+([^\n]+)/;
    #   $r->{info}->@{qw/show day time sw show_name code description/} = $data =~ /$re/;
    
        $re = qr/RETCODE = (\d+)\s+([^\n]+)/;
        $r->@{qw/code description/} = $data =~ /$re/;
        
        @sections = $data =~ /\n\n(.+?\n-+.*?\(Number of results = \d+\))/gs;
    
        for my $block ( @sections ) {
            my($section,@lines,$key,$value);
            @lines   = split("\n",$block);
            $section = $lines[0];
            for my $line (@lines[2..$#lines-2] ) {
                $line =~ s/^\s+//;
                if( $line =~ /^=\s+(.+)/ ) {
                    $r->{$section}{$key} = [ $r->{$section}{$key} ] unless ref $r->{$section}{$key} eq 'ARRAY';
                    push @{$r->{$section}{$key}}, $1;
                } else {
                    ($key,$value) = split(/  =  /,$line);
                    $r->{$section}{$key} = $value; 
                }
            }
            $r->{$section}{results} = $block =~ /\(Number of results = (\d+)\)/gs;
        }
        
        return $r;
    }
    
    __DATA__
    
    +++    STAR-WARS        2020-01-01 00:00:00+00:00
    S&W    #00000000
    %%SHOW NAME: Q=Kenobi;%%
    RETCODE = 0  Operation success
    
    In-universe information
    -----------------------
                     Species  =  Human
                      Gender  =  Male
    television series of num  =  whatever
    
    (Number of results = 1)
    
    Personal Details
    ----------------
                            First Name  =  Obi-Wan
                             Last Name  =  Kenobi
                                 Alias  =  Padawan
                                        =  Jedi Knight
                                        =  Jedi General
                                        =  Jedi Master
    Points to other set of information  =  whatever
    
    (Number of results = 1)
    
    Other attribute
    ---------------
    Significant other  =  Satine Kryze
          Affiliation  =  Jedi Order
                       =  Galactic Republic
                       =  Rebel Alliance
           Occupation  =  Jedi
    
    (Number of results = 1)
    
    ---    END
    
    +++    STAR-WARS        2020-01-01 00:00:00+00:00
    S&W    #00000000
    %%SHOW NAME: Q=Kenobi;%%
    RETCODE = 0  Operation success
    
    In-universe information
    -----------------------
                     Species  =  Human
                      Gender  =  Male
    television series of num  =  whatever
    
    (Number of results = 1)
    
    Personal Details
    ----------------
                            First Name  =  Obi-Wan
                             Last Name  =  Kenobi
                                 Alias  =  Padawan
                                        =  Jedi Knight
                                        =  Jedi General
                                        =  Jedi Master
    Points to other set of information  =  whatever
    
    (Number of results = 1)
    
    Other attribute
    ---------------
    Significant other  =  Satine Kryze
          Affiliation  =  Jedi Order
                       =  Galactic Republic
                       =  Rebel Alliance
           Occupation  =  Jedi
    
    (Number of results = 1)
    
    ---    END
    

    Output

    $VAR1 = [
              {
                'Other attribute' => {
                                       'Significant other' => 'Satine Kryze',
                                       'Occupation' => 'Jedi',
                                       'results' => 1,
                                       'Affiliation' => [
                                                          'Jedi Order',
                                                          'Galactic Republic',
                                                          'Rebel Alliance'
                                                        ]
                                     },
                'Personal Details' => {
                                        'results' => 1,
                                        'First Name' => 'Obi-Wan',
                                        'Alias' => [
                                                     'Padawan',
                                                     'Jedi Knight',
                                                     'Jedi General',
                                                     'Jedi Master'
                                                   ],
                                        'Points to other set of information' => 'whatever',
                                        'Last Name' => 'Kenobi'
                                      },
                'code' => '0',
                'description' => 'Operation success',
                'In-universe information' => {
                                               'television series of num' => 'whatever',
                                               'Gender' => 'Male',
                                               'results' => 1,
                                               'Species' => 'Human'
                                             }
              },
              {
                'Other attribute' => {
                                       'Affiliation' => [
                                                          'Jedi Order',
                                                          'Galactic Republic',
                                                          'Rebel Alliance'
                                                        ],
                                       'results' => 1,
                                       'Significant other' => 'Satine Kryze',
                                       'Occupation' => 'Jedi'
                                     },
                'Personal Details' => {
                                        'First Name' => 'Obi-Wan',
                                        'results' => 1,
                                        'Last Name' => 'Kenobi',
                                        'Alias' => [
                                                     'Padawan',
                                                     'Jedi Knight',
                                                     'Jedi General',
                                                     'Jedi Master'
                                                   ],
                                        'Points to other set of information' => 'whatever'
                                      },
                'code' => '0',
                'description' => 'Operation success',
                'In-universe information' => {
                                               'television series of num' => 'whatever',
                                               'results' => 1,
                                               'Gender' => 'Male',
                                               'Species' => 'Human'
                                             }
              }
            ];