Search code examples
regexperlcut

Perl cut characters up to revolving regex, print to end of line


I have this data, where i want to cut out the date, and print everything from the Initials to the end. I mapped the initials.

30th Mar 2020 5:53:18 pm Charlie Brown: BJ: Bloomberg Runs
30th Mar 2020 5:53:27 pm Charlie Brown: DS: ICE DATA = INC1018483661
30th Mar 2020 6:42:43 pm Boris Yeltsin: Cortese's ICE logs is for the Bloomberg Runs issue
30th Mar 2020 6:43:28 pm Charlie Brown: yeap
31st Mar 2020 4:11:22 am Ishtar Johnson: VK : RE: XS2018777099 & XS2018777172 - INC1018491954
31st Mar 2020 6:31:17 am Tommy Boy: NW: RE: SABSM 6.125 YTW - INC1018495843
31st Mar 2020 7:26:40 am Tommy Boy: AP: RE: Rolling 7yrs - INC1018497102
31st Mar 2020 7:45:36 am Tommy Boy: JK: RE: Chris White books - INC1018497380

Here is the code -

#!/usr/bin/perl

use strict;
use warnings;

my @team = ("AP","II","DS","WJ", "JK","LC","BJ") ;
my ( $team_regex ) = map {qr /$_/} join "|", map {quotemeta} @team;

my @orderdTeam ;
my $filename = shift @ARGV ;
open(my $fh, '<', $filename) or die "Could not open file $filename $!";
while (my $line = <$fh> ) {
        #$line =~ /($team_regex .*)/s  ;
        $line = /($team_regex .*)/s  ;
        print "$line\n";

}
close $fh;

For some reason i get thse uninitialzed errors.

johnswal@NYKPWM2037968 ~
$ ./cut_date_symphony.pl fooberry
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 1.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 2.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 3.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 4.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 5.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 6.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 7.
Use of uninitialized value $_ in pattern match (m//) at ./cut_date_symphony.pl line 14, <$fh> line 8.

The commented line just prints out the whole line - it does not cut out the date or time

#$line =~ /($team_regex .*)/s  ;

So this is what I am looking for. "Tommy Boy NW:" and "Ishtar Johnson VK:" are part of our team, but from Europe. Only the American team members in the map array "@team_regex" tickets will be displayed. and the time and date will be cut out of the line.

BJ: Bloomberg Runs
DS: ICE DATA = INC1018483661
AP: RE: Rolling 7yrs - INC1018497102
JK: RE: Chris White books - INC1018497380

Solution

  • Line 14 is this line:

    $line = /($team_regex .*)/s  ;
    

    The match operator (/.../) works on either the variable that is bound to it using the =~ operator or $_ if no such variable is given. You don't use =~, so the match operator tries to match against $. And $_ contains no data, so Perl gives you the "undefined value" warning that you see.

    I think you want to match the regex against the contents of $line. So you need to use =~ instead of = - as in your commented out line.

    $line =~ /($team_regex .*)/s  ;
    

    But in a comment above you explain that you've commented this out because:

    The commented line does not cut any characters out - it prints the whole ine

    And of course it does that because you've written no code to change $line in any way. But what you want is in $1 after the match, so print that instead.

    $line =~ /($team_regex .*)/s  ;
    print $1;
    

    But the regex variables like $1 only get set on a successful match, so it's important to check the match works before printing them out. You can do that by putting the match operator in an if statement.

    if ($line =~ /($team_regex .*)/s) {
      print $1;
    }
    

    Update: Oh, and that doesn't work as the team codes in your data are followed by a colon, not a space (as your regex assumes). So change it to this:

    if ($line =~ /($team_regex:.*)/s) {
      print $1;
    }