I am trying to glob files from a directory and print out regexp matches, Trying to match
<110>
*everything here*
<120>
My matches would be
SCHALLY, ANDREW V.
CAI, REN ZHI
ZARANDI, MARTA
However when i try to split this by newline and join using "|", I am not getting the desired output that is
Applicant : SCHALLY, ANDREW V. | CAI, REN ZHI | ZARANDI, MARTA
My current output is only
| ZARANDI, MARTA
Can someone see any obvious mistakes?
#!/usr/bin/perl
use warnings;
use strict;
use IO::Handle;
open (my $fh, '>', '../logfile.txt') || die "can't open logfile.txt";
open (STDERR, ">>&=", $fh) || die "can't redirect STDERR";
$fh->autoflush(1);
my $input_path = "../input/";
my $output_path = "../output/";
my $whole_file;
opendir INPUTDIR, $input_path or die "Cannot find dir $input_path : $!";
my @input_files = readdir INPUTDIR;
closedir INPUTDIR;
foreach my $input_file (@input_files)
{
$whole_file = &getfile($input_path.$input_file);
if ($whole_file){
$whole_file =~ /[<][1][1][0][>](.*)[<][1][2][0][>]/s ;
if ($1){
my $applicant_string = "Applicant : $1";
my $op = join( "|", split("\n", $applicant_string) );
print $op;
}
}
}
close $fh;
sub getfile {
my $filename = shift;
open F, "< $filename " or die "Could not open $filename : $!" ;
local $/ = undef;
my $contents = <F>;
close F;
return $contents;
}
I Ran Code on a single file
#!/usr/bin/perl
use warnings;
use strict;
use IO::Handle;
my $input_file = "01.txt-WO13_090919_PD_20130620";
my $input_path = "../input/";
my $whole_file = &getfile($input_path.$input_file);
if ($whole_file =~ /[<][1][1][0][>](.*)[<][1][2][0][>]/s ) {
print $1;
my @split_string = split("\n", $1);
my $new_string = join("|", @split_string) ;
print "$new_string \n";
}
sub getfile {
my $filename = shift;
open F, "< $filename " or die "Could not open $filename : $!" ;
local $/ = undef;
my $contents = <F>;
close F;
return $contents;
}
Output
Chen, Guokai
Thomson, James
Hou, Zhonggang
Hou, Zhonggang
I run your code and get
|SCHALLY, ANDREW V. |CAI, REN ZHI| ZARANDI, MARTA
Which is pretty close. all you need to do is trim whitespace before you join. So replace this
my @split_string = split("\n", $1);
my $new_string = join("|", @split_string) ;
With this:
my @split_string = split("\n", $1);
my @names;
foreach my $name ( @split_string ) {
$name =~ s/^\s*(.*)\s*$/$1/;
next if $name =~ /^$/;
push @names, $name;
}
my $new_string = join("|", @names);