Search code examples
regexperlsubroutineperl-data-structures

Splitting regex matches by newlines in perl


I am trying to glob files from a directory and print out regexp matches, Trying to match

 <110> 
    *everything here*
 <120>

My matches would be

SCHALLY, ANDREW V. 
CAI, REN ZHI
      ZARANDI, MARTA

However when i try to split this by newline and join using "|", I am not getting the desired output that is

Applicant :  SCHALLY, ANDREW V. | CAI, REN ZHI | ZARANDI, MARTA

My current output is only

 |        ZARANDI, MARTA

Can someone see any obvious mistakes?

#!/usr/bin/perl
use warnings;
use strict;
use IO::Handle;

open (my $fh, '>', '../logfile.txt')  || die "can't open logfile.txt";
open (STDERR, ">>&=", $fh)         || die "can't redirect STDERR";
$fh->autoflush(1);



my $input_path = "../input/";
my $output_path = "../output/";
my $whole_file;

opendir INPUTDIR, $input_path or die "Cannot find dir $input_path : $!";
my @input_files = readdir INPUTDIR;
closedir INPUTDIR;

foreach my $input_file  (@input_files) 
{   
    $whole_file = &getfile($input_path.$input_file); 
    if ($whole_file){
        $whole_file =~  /[<][1][1][0][>](.*)[<][1][2][0][>]/s ;
        if ($1){
            my $applicant_string = "Applicant : $1";
            my $op = join( "|", split("\n", $applicant_string) );
            print $op; 
        }
    }
}

close $fh;




sub getfile {
    my $filename = shift;
    open F, "< $filename " or die "Could not open $filename : $!" ;
    local $/ = undef; 
    my $contents = <F>;
    close F;
    return $contents;
}

EDIT 1

I Ran Code on a single file

    #!/usr/bin/perl
use warnings;
use strict;
use IO::Handle;


my $input_file = "01.txt-WO13_090919_PD_20130620";
my $input_path = "../input/";

my $whole_file = &getfile($input_path.$input_file); 


if ($whole_file =~  /[<][1][1][0][>](.*)[<][1][2][0][>]/s ) {
        print $1;
            my @split_string = split("\n", $1);
            my $new_string =  join("|", @split_string) ;
            print "$new_string \n";
}



sub getfile {
    my $filename = shift;
    open F, "< $filename " or die "Could not open $filename : $!" ;
    local $/ = undef; 
    my $contents = <F>;
    close F;
    return $contents;
}

Output

  Chen, Guokai
       Thomson, James
       Hou, Zhonggang

        Hou, Zhonggang

Solution

  • I run your code and get

    |SCHALLY, ANDREW V. |CAI, REN ZHI|      ZARANDI, MARTA
    

    Which is pretty close. all you need to do is trim whitespace before you join. So replace this

     my @split_string = split("\n", $1);
     my $new_string =  join("|", @split_string) ;
    

    With this:

     my @split_string = split("\n", $1);
     my @names;
     foreach my $name ( @split_string ) {
       $name =~ s/^\s*(.*)\s*$/$1/;
       next if $name =~ /^$/; 
       push @names, $name;
     }
    
     my $new_string =  join("|", @names);