I've tested my program on a dozen Windows machines, a half dozen Macs, and a Linux machine and it works without error on both the Windows and Linux but not the Macs. My program is designed to work with protein database files which are text files that range from 250MB to 10GB. I took 1/10th of the 250MB file to make a sample file for debugging purposes but found that the error did not occur with the smaller file.
I've narrowed down the bug to this section of code, in this section $tempFile
, is the protein database file:
open(ps_file, "..".$slash."dataset".$slash.$tempFile)
or die "couldn't open $tempFile";
while(<ps_file>){
chomp;
my @curLine = split(/\t/, $_);
my $filter = 1;
if($taxon){
chomp($curLine[2]);
print "line2 ".$curLine[2].",\t".$taxR{$curLine[2]}."\n";
$filter = $taxR{$curLine[2]};
}
if($filter){
checkSeq(@curLine);
}
}
This is a screenshot of the output of that print statement showing special characters:
This is what the output looks like on a Windows Machine:
Here is an example of 1 line from the $tempFile
>sp|P48255|ABCX_CYAPA Probable ATP-dependent transporter ycf16 OS=Cyanophora paradoxa GN=ycf16 PE=3 SV=1 MSTEKTKILEVKNLKAQVDGTEILKGVNLTINSGEIHAIMGPNGSGKSTFSKILAGHPAYQVTGGEILFKNKNLLELEPEERARAGVFLAFQYPIEIAGVSNIDFLRLAYNNRRKEEGLTELDPLTFYSIVKEKLNVVKMDPHFLNRNVNEGFSGGEKKRNEILQMALLNPSLAILDETDSGLDIDALRIVAEGVNQLSNKENSIILITHYQRLLDYIVPDYIHVMQNGRILKTGGAELAKELEIKGYDWLNELEMVKK CYAPA
The problem probably lies in inconsistent line-endings. If, as I suspect, trailing whitespace is not significant, you're better off removing that instead of chomp
ing.
Also note:
Bareword filehandles such as ps_file
are package global variables that are subject to action at a distance, use lexical filehandles.
Use File::Spec
or Path::Class
to handle file paths in a platform independent way.
Include full file paths and error message if there is an error opening a file.
In
chomp;
my @curLine = split(/\t/, $_);
my $filter = 1;
if($taxon){
chomp($curLine[2]);
$curLine[2]
comes from a string that was read in as a line and chomp
ed. I don't see why you are chomping that again.
Here's tidied up version of your code-snippet:
use File::Spec::Functions qw( catfile );
my $input_file = catfile('..', dataset => $tempFile);
open my $ps_file, '<', $input_file
or die "couldn't open '$input_file': $!";
while (my $line = <$ps_file>) {
$line =~ s/\s+\z//; # remove all trailing space
my @curLine = split /\t/, $line;
my $filter = 1;
if ($taxon) {
my $field = $curLine[2];
$filter = $taxR{ $field };
print join("\t", "line2 $field", $filter), "\n";
}
if ($filter) {
checkSeq(@curLine);
}
}