Definition error when using columns in one file to find matching columns in another file with perl

I have a tab delimited input file in the format:

+    Chr1    www
-    Chr2    zzz

I would like to go line by line against a reference tab delimited file (TRANSCRIPTS in the code below) in the format of:

Chr1    +    xxx    UsefulInfo1
Chr2    -    yyy    UsefulInfo2

And would like an output that looks like:

+    Chr1    UsefulInfo1
-    Chr2    UsefulInfo2

Here is my attempt to take variable names from the command line, grab certain info from the input file, and append the useful info from the reference file:


use strict;
use warnings;
use diagnostics;

my $inFile = $ARGV[0];
my $outFile = $ARGV[1];

open(INFILE, "<$inFile") || die("Couldn't open $inFile: $!\n");
open(OUTFILE, ">$outFile") || die("Couldn't create $outFile: $!\n");

open(TRANSCRIPTS, "</path/TranscriptInfo") || die("Couldn't open reference file!");
my @transcripts = split(/\t+/, <TRANSCRIPTS>);
chomp @transcripts;

#Define desired information from input for later
while (my @columns = split(/\t+/, <INFILE>)) {
    chomp @columns;
    my $strand = $columns[0];
    my $chromosome = $columns[1];

    #Attempt to search reference file line by line for matching criteria and copying a column of matching lines
    foreach my $reference(@transcripts) {
        my $refChr = $reference[0]; #Error for this line
        my $refStrand = $reference[1]; #Error for this line
        if ($refChr eq $chromosome && $refStrand eq $strand) {
            my $info = $reference[3]; #Error for this line
            print OUTFILE "$strand\t$chromosome\t\$info\n";
close(OUTFILE); close(INFILE);

At the moment I receive "Global symbol "@reference" requires explicit package name." What is the proper way to define this? I'm not even entirely sure my foreach loop functions as desired even once defining the symbol properly.


  • Fixed:

    use strict;
    use warnings;
    use feature qw( say );
    my $in_qfn          = $ARGV[0];
    my $out_qfn         = $ARGV[1];
    my $transcripts_qfn = "/path/TranscriptInfo";
    my @transcripts;
       open(my $transcripts_fh, "<", $transcripts_qfn)
          or die("Can't open \"$transcripts_qfn\": $!\n");
       while (<$transcripts_fh>) {
          push @transcripts, [ split(/\t/, $_, -1) ];
       open(my $in_fh, "<", $in_qfn)
          or die("Can't open \"$in_qfn\": $!\n");
       open(my $out_fh, ">", $out_qfn)
          or die("Can't create \"$out_qfn\": $!\n");
       while (<$in_fh>) {
          my ($strand, $chr) = split(/\t/, $_, -1);
          for my $transcript (@transcripts) {
             my $ref_chr    = $transcript->[0];
             my $ref_strand = $transcript->[1];
             if ($chr eq $ref_chr && $strand eq $ref_strand) {
                my $info = $transcript->[2];
                say $out_fh join("\t", $strand, $chr, $info);

    That said, the above is very inefficient. Let's call N the number of lines in $transcript_qfn, and let's call M the number of lines in $in_qfn. The inner loop executes a number of times equal to N*M. In fact, it needs only execute N times.

    use strict;
    use warnings;
    use feature qw( say );
    my $in_qfn          = $ARGV[0];
    my $out_qfn         = $ARGV[1];
    my $transcripts_qfn = "/path/TranscriptInfo";
    my %to_print;
       open(my $in_fh, "<", $in_qfn)
          or die("Can't open \"$in_qfn\": $!\n");
       while (<$in_fh>) {
          my ($strand, $chr) = split(/\t/, $_, -1);
       open(my $transcript_fh, "<", $transcript_qfn)
          or die("Can't open \"$transcript_qfn\": $!\n");
       open(my $out_fh, ">", $out_qfn)
          or die("Can't create \"$out_qfn\": $!\n");
       while (<$transcript_fh>) {
          my ($ref_chr, $ref_strand, $info) = split(/\t/, $_, -1);
          next if !$to_print{$ref_strand};
          next if !$to_print{$ref_strand}{$ref_chr};
          say $out_fh join("\t", $ref_strand, $ref_chr, $info);