Search code examples
perlparsingfasta

How can I parse a file for match, and print the string prior to matched string in Perl?


I'm trying to parse a GBK file. Basically, I need to return the locus tag and product name of genes that match the pattern. So if the motif I want to search for all predicted gene product, the search word "predicted" would return:

/product="predicted semialdehyde dehydrogenase"
/locus_tag="ECDH10B_2481"

I've been able to return the /product but I can't figure out how to parse "backwards" to grab the /locus_tag.

Here's what I have so far:

my $fasta_file = 'example.txt';
open(INPUT, $fasta_file) || die "ERROR: can't read input FASTA file: $!";
while ( <INPUT> ) {
     if(/predicted/){
            print $_;
     }
}

> example.txt

gene            complement(2525423..2526436)
                 /gene="usg"
                 /locus_tag="ECDH10B_2481"
 CDS             complement(2525423..2526436)
                 /gene="usg"
                 /locus_tag="ECDH10B_2481"
                 /codon_start=1
                 /transl_table=11
                 /product="predicted semialdehyde dehydrogenase"
                 /protein_id="ACB03477.1"
                 /db_xref="GI:169889770"
                 /db_xref="ASAP:AEC-0002184"
                 /translation="MSEGWNIAVLGATGAVGEALLETLAERQFPVGEIYALARNESAG
                 EQL"
 gene            complement(2526502..2527638)
                 /gene="pdxB"
                 /locus_tag="ECDH10B_2482"
 CDS             complement(2526502..2527638)
                 /gene="pdxB"
                 /locus_tag="ECDH10B_2482"
                 /codon_start=1
                 /transl_table=11
                 /product="erythronate-4-phosphate dehydrogenase"
                 /protein_id="ACB03478.1"
                 /db_xref="GI:169889771"
                 /db_xref="ASAP:AEC-0002185"
                 /translation="MKILVDENMPYARDLFSRLGEVTAVPGRPIPVAQLADADALMVR
                 SVTKVNESLLAGKPIKFVGTATAGTDHVDEAWLKQAGIGFSAAP"

Solution

  • Just remember the last locus tag encountered and print it if predicted:

    #!/usr/bin/perl
    use warnings;
    use strict;
    
    my $fasta_file = 'example.txt';
    open my $INPUT, '<', $fasta_file or die "ERROR: can't read input FASTA file: $!";
    
    my $locus_tag;
    while (<$INPUT>) {
        if (/locus_tag/) {
            $locus_tag = $_;
        } elsif (/predicted/) {
            print;
            print $locus_tag;
        }
    }